amit
4a0f7888a3
[Core] feat: Implement Priority Scheduling in V1 Engine ( #19057 )
...
Signed-off-by: amit <amit.man@gmail.com>
Co-authored-by: Roger Wang <Rogerw0108@gmail.com>
2025-06-22 20:18:08 -07:00
Aaron Pham
c4cf260677
[Perf][CLI] Improve overall startup time ( #19941 )
2025-06-22 23:11:22 +00:00
Ye (Charlotte) Qi
33d51f599e
[BugFix] Add an env to disable moe chunking to work around compile incompatibility ( #19642 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-06-22 15:17:49 -07:00
Aaron Pham
e91386cde1
[Chore] dedup logs ( #19955 )
2025-06-22 19:43:07 +00:00
Ye (Charlotte) Qi
2c11a29f0b
[Misc] Simplify vllm bench cli subcommand implementation ( #19948 )
2025-06-22 12:34:48 -04:00
Roger Wang
c76a506bd6
[Misc] Update model-specific PR tagging ( #19949 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-06-22 12:16:08 +00:00
Reid
ec0db6f51c
[doc] use snippets for contact us ( #19944 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-22 10:26:13 +00:00
22quinn
c305a2109d
[CI/Build] Auto tag perf benchmarks related PRs ( #19943 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-22 08:46:21 +00:00
Wang, Yi
202c5df935
[Benchmark] fix request loss if "ping" is returned ( #19535 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-22 07:21:04 +00:00
Ning Xie
2bb246b8f7
[MISC] add cpu_kvcache_space_bytes to CacheConfig ( #19812 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-22 13:39:09 +08:00
Ning Xie
4c409cabc2
[Misc] add vllm_config in __init__ ( #19866 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-21 23:10:46 -04:00
Adrian
3b1e4c6a23
[Docs] Add GPT2ForSequenceClassification to supported models in docs ( #19932 )
...
Signed-off-by: nie3e <adrcwiek@gmail.com>
2025-06-21 20:57:19 +00:00
Woosuk Kwon
2c5302fadd
[Multimodal] Optimize Qwen2/2.5-VL startup time ( #19756 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-06-21 20:01:07 +00:00
Reid
caa680fd2e
[doc] add contact us in community ( #19922 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-21 17:29:06 +00:00
汪志鹏
c3bf9bad11
[New model support]Support Tarsier2 ( #19887 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-21 04:01:51 +00:00
Isotr0py
6f170f11dd
[Bugfix] Fix bnb 8bit model weights loading ( #19917 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-21 03:29:09 +00:00
Rabin Adhikari
8ca81bb069
Fix: Check the type of params to be a Sequence not list. ( #19910 )
...
Signed-off-by: Rabin Adhikari <rabin.adk1@gmail.com>
2025-06-20 23:03:17 +00:00
wangxiyuan
e773a9e1c2
[Misc] Clean up useless code ( #19889 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-20 21:09:09 +00:00
Ning Xie
71baf85ae1
[Kernel] mark TorchSDPABackend swap_blocks NotImplementedError ( #19749 )
2025-06-20 18:18:11 +00:00
Li, Jiang
79f2f1c2a1
[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests ( #19901 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-20 15:30:36 +00:00
Vlad Tiberiu Mihailescu
2e3e3c86dc
Export NaNs in logits to scheduler_stats if output is corrupted ( #18777 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2025-06-20 22:47:16 +08:00
Chendi.Xue
7e8977fcd4
[custom_op][vllm-plugin] update custom_op class to use op_registry ( #19164 )
...
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2025-06-20 07:44:56 -07:00
Adrian
f1e840e842
[Model] GPT2ForSequenceClassification model ( #19663 )
...
Signed-off-by: nie3e <adrcwiek@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-20 12:07:41 +00:00
Thomas Parnell
7771d1de88
[Fix] import regex instead of re ( #19875 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-06-20 11:16:48 +00:00
Ning Xie
71d1219545
[Kernel] correct cpu worker function parameter type ( #19745 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-20 10:50:13 +00:00
Reid
e384f2f108
[Misc] refactor example - openai_transcription_client ( #19851 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-20 08:02:21 +00:00
Reid
089a306f19
[Misc] update cuda version ( #19526 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-20 07:25:15 +00:00
kourosh hakhamaneshi
5e666f72cd
[Bugfix][Ray] Set the cuda context eagerly in the ray worker ( #19583 )
2025-06-19 22:01:16 -07:00
qli88
e3a3e4db46
[Bugfix] Enable PP with AITER+V1 ( #19822 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com>
2025-06-20 12:43:20 +08:00
Xerxes
e41bf15cd0
[Chore]: qwen3-moe-type-hints-mistake ( #19860 )
...
Co-authored-by: xinnan.hou <hxn02029096@alibaba-inc.com>
2025-06-19 21:43:07 -07:00
Brayden Zhong
5aa4a015ce
[Benchmark] Fix Value of type "SampleRequest" is not indexable ( #18032 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-06-19 21:28:55 -07:00
Elaine Zhao
b6bad3d186
[CI][Neuron] Fail and exit on first error ( #19622 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-20 12:27:51 +08:00
Isotr0py
ee9a1531aa
[CI/Build][Bugfix] Fix deadlock on v1 engine test CI ( #19872 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-20 09:51:07 +08:00
Robert Shaw
10d82f9ac5
[Benchmark][Bugfix] Fix Dataset Length Calculation ( #19868 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-06-19 18:30:41 -07:00
xzbdmw
ea10dd9d9e
[Frontend] early return chat format resolution when specified ( #19735 )
2025-06-19 18:49:59 +00:00
Alex Brooks
ead2110297
[Core][Bugfix] Fix Online MM Beam Search ( #19688 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-06-19 17:18:07 +00:00
Li, Jiang
01220ce89a
[CI][CPU] Improve dummy Triton interfaces and fix the CPU CI ( #19838 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-19 15:46:09 +00:00
22quinn
6f68c49220
[Doc] Update V1 user guide for embedding models ( #19842 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-19 09:43:27 +00:00
Alexei-V-Ivanov-AMD
4719460644
Fixing Chunked Prefill Test. ( #19762 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-06-19 01:36:16 -07:00
NekoMimiUnagi
466166dcfd
[Frontend] Add optional token-level progress bar to LLM.beam_search ( #19301 )
...
Signed-off-by: Ruosen Li <rxl190028@utdallas.edu>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-19 03:21:41 -04:00
Zuxin
1d0ae26c85
Add xLAM tool parser support ( #17148 )
2025-06-19 14:26:41 +08:00
Isotr0py
6021999573
[Minor] Allow redirecting model path for HfRunner in test ( #19795 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-18 23:04:10 -07:00
Ning Xie
c7b370c603
raise exception for pin_lora ( #19809 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-18 22:57:35 -07:00
zsolt-borbely-htec
aa20d10a91
[Misc] [ROCm] Prevent surplus tensor reshape ( #19803 )
...
Signed-off-by: Zsolt Borbely <zsolt.borbely@htecgroup.com>
2025-06-19 13:57:16 +08:00
TJian
2de12be428
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 ( #18990 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-06-18 22:56:31 -07:00
Yu-Hang "Maxin" Tang
83ca9ae47b
Mark invariant normalizer in Gemma as non-persistent ( #19788 )
...
Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
2025-06-18 22:56:03 -07:00
kourosh hakhamaneshi
e2148dc5ea
[Bugfix] Add check_health to v1 async client. ( #19821 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-06-18 21:47:01 -07:00
Lu Fang
b1098b4072
[Bugfix] Fix the linter ( #19826 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 21:44:41 -07:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 ( #16188 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Jee Jee Li
4959915089
[Quantization] Modify the logic of BNB double quantization ( #19742 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-19 03:52:09 +00:00