Russell Bryant
58d4c705a8
[Core] Get num_encoder_tokens from scheduler config ( #24989 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-16 20:59:07 -07:00
Prashant Gupta
ea3de5ef0d
[misc] fix typo in value error ( #24995 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2025-09-16 20:58:38 -07:00
Michael Goin
67532a1a68
[UX] Remove "quantization is not fully optimized yet" log ( #25012 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-16 20:57:51 -07:00
yyzxw
5672ba90bd
[Docs] fix invalid doc link ( #25017 )
...
Signed-off-by: zxw <1020938856@qq.com>
2025-09-16 20:53:23 -07:00
Michael Goin
dd83a157f1
[UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc ( #24761 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-09-16 20:42:23 -07:00
Isotr0py
5a411ef6c4
[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets ( #24719 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-17 03:29:43 +00:00
Nick Hill
eeb135eb87
[Core] Use CpuGpuBuffer for block table tensors ( #24795 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-16 19:18:06 -07:00
elvischenv
3059b9cc6b
[Doc] Add --force-overwrite option to generate_cmake_presets.py ( #24375 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-16 18:45:29 -07:00
Benjamin Bartels
64ad551878
Removes source compilation of nixl dependency ( #24874 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
2025-09-17 01:33:18 +00:00
Tahsin Tunan
cef32104b4
[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-16 18:31:06 -07:00
Michael Goin
493b10f8bf
[CI] GPT-OSS GPQA eval test for Blackwell ( #24920 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 18:13:21 -07:00
Matthew Bonanni
d119fc8614
[CI][Bugfix] Fix failing Blackwell test ( #24993 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 15:55:02 -07:00
Michael Goin
dbebb7f812
[Perf] Reuse workspace for FP8+FP4 Marlin MoE ( #20500 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-16 15:45:10 -06:00
Aleksandr Malyshev
3053a22b33
fp8 kv cache support fix for torch.compile ( #22758 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-09-16 21:27:11 +00:00
Andrew Sansom
02d4b85454
Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs ( #24987 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-16 14:06:56 -07:00
Andrew Xia
86daa875fe
[gpt-oss][1][bugfix] fix streaming final output ( #24466 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 13:56:16 -06:00
Concurrensee
dcf2f3ec06
[ROCm] Add dependencies for ROCm ( #24900 )
...
Signed-off-by: Yida Wu <yida.wu@amd.com>
2025-09-16 19:49:06 +00:00
Chen Zhang
218454b9b2
[MISC] Add code owners of vllm/v1 to vllm/v1/core ( #24928 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-16 19:07:34 +00:00
Andrew Xia
f4d6eb95cf
[gpt-oss][1b] streaming add item id, content id ( #24788 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 18:41:12 +00:00
Sugar
cd1f885bcf
Directly get max encoder len from VLLM config in V1 ( #24866 )
...
Signed-off-by: Sugar-zsg <952242923@qq.com>
2025-09-16 17:52:31 +00:00
Isotr0py
d593cf28fa
[Misc] Add removed encoder-decoder models to previously supported models list ( #24961 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-16 10:46:46 -07:00
lianyibo
faa7a5daac
[Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true ( #24571 )
...
Signed-off-by: lianyibo <lianyibo1@kunlunit.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-09-16 17:36:58 +00:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Lukas Geiger
08369289af
[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing ( #24925 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-16 15:32:47 +00:00
Chih-Chieh Yang
73cfb3c5ee
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 ( #24331 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-09-16 14:53:43 +00:00
Ming Yang
4e5affeaa1
[CI] Add Decode Context Parallelism (DCP) test to CI ( #24487 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-09-16 21:21:28 +08:00
TeeKen Lau
e4f0b4cd96
(doc): set cmake c++ compatible standard when building on MacOS CPU. ( #23483 )
...
Signed-off-by: teekenl <teekenlau@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 06:08:46 -07:00
liangwen12year
de3e53a75b
feat: Add Grafana and Perces monitoring dashboards for vLLM ( #23498 )
2025-09-16 05:53:40 -07:00
Ye (Charlotte) Qi
85e0df1392
[Docs] move benchmarks README to contributing guides ( #24820 )
2025-09-16 05:52:57 -07:00
Harry Mellor
0faf3cc3e8
Move SpeculativeConfig from config/__init__.py to config/speculative.py ( #24904 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 12:51:35 +01:00
Chen Bruce
7ea5c73ad7
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. ( #23745 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Signed-off-by: Chen Bruce <bruceszchen@tencent.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com>
Co-authored-by: lemon412 <lemon412@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 10:55:16 +00:00
tomeras91
27fcfe7bcf
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 ( #24593 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 10:51:01 +00:00
Cheng Kuan Yong Jason
68dbde5dbb
[Bugfix] remove duplicate tokens streamed in required tool choice streaming ( #23312 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-16 15:16:32 +08:00
Jee Jee Li
04ad0dc275
[benchmark] Add triton version in the moe tuned config ( #24769 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 14:10:54 +08:00
Saman A. Pour
238c4c1705
[QWEN NEXT] Fused MoE kernels Optimization configs ( #24924 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 13:06:03 +08:00
vllmellm
8c54610265
[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target ( #24505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-09-16 04:45:38 +00:00
cascade
17871983a2
[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism ( #24021 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-09-16 04:32:32 +00:00
Woosuk Kwon
759ef49b15
Remove V0 Encoder-Decoder Support ( #24907 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Kunshang Ji
5206ab20ba
[XPU] Fix circular import error. ( #24927 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-16 03:35:36 +00:00
Lu Fang
0af3ce1355
Upgrade flashinfer to 0.3.1 ( #24470 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-16 02:36:09 +00:00
Richard Zou
e1279ef00f
[Docs] Update instructions for how to using existing torch binary ( #24892 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 02:25:50 +00:00
Mark McLoughlin
2942970d44
[Metrics] Hide deprecated metrics with gpu_ prefix ( #24245 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-09-15 20:15:57 -06:00
Wentao Ye
3c96e7b8a1
[CI] Small Accuracy Eval Test for Deepseek Model ( #24259 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:14:50 -06:00
Wentao Ye
b42566f440
[Bug] Fix is_flashmla_supported Check Error ( #24774 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:10:55 -06:00
Reza Barazesh
d96e11167d
Add pytest-cov and .coveragerc ( #24778 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
2025-09-15 20:08:46 -06:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias ( #24895 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Wentao Ye
de2cc3d867
[Deprecation] Remove DeepGEMM Old Symbol Wrapper ( #24902 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:03:29 -06:00
Michael Goin
e95084308b
Updated CODEOWNERS for flashinfer, mla, fused_moe ( #24906 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-16 02:01:28 +00:00
Sergio Paniego Blanco
7f6f2c1182
HuggingFace -> Hugging Face in Integration with Hugging Face docs (#24889 )
2025-09-15 17:28:35 -07:00
Jiangyun Zhu
5bcc153d7b
[Compile] Fix noop_elimination pass and add tests for noop_elimination ( #24880 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-15 23:33:18 +00:00