Cyrus Leung
61dcc280fa
[Doc] Add Voxtral to Supported Models page ( #22059 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-31 23:10:56 -07:00
Kyle Sayers
0f46a780d4
[Model] [Quantization] Support quantization for Gemma3n ( #21974 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-31 22:45:15 -07:00
Mickaël Seznec
e1a7fe4af5
[BugFix] fix: aot passes kvcache dtype information ( #19750 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-08-01 05:45:02 +00:00
Cyrus Leung
82de9b9d46
[Misc] Automatically resolve HF processor init kwargs ( #22005 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-31 22:44:10 -07:00
Charent
ad57f23f6a
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache ( #20873 )
...
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
2025-07-31 19:48:13 -07:00
Wentao Ye
3700642013
[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM ( #21787 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-01 01:13:27 +00:00
Michael Goin
0bd409cf01
Move flashinfer-python to optional extra vllm[flashinfer] ( #21959 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-31 18:02:11 -07:00
Matthew Bonanni
e360316ab9
Add DeepGEMM to Dockerfile in vllm-base image ( #21533 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-31 18:01:55 -07:00
Wentao Ye
c3e0e9337e
[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 ( #21639 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-31 15:26:11 -07:00
Ilya Markov
6e672daf62
Add FlashInfer allreduce RMSNorm Quant fusion ( #21069 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-31 13:58:38 -07:00
Benjamin Chislett
2dff2e21d9
[Bugfix] Fix MTP weight loading ( #21941 )
2025-07-31 16:33:53 -04:00
Yong Hoon Shin
71470bc4af
[Misc] Add unit tests for chunked local attention ( #21692 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-31 11:39:16 -07:00
zhiweiz
9e0726e5bf
[Meta] Official Eagle mm support, first enablement on llama4 ( #20788 )
...
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-07-31 10:35:07 -07:00
XiongfeiWei
53c21e492e
Update torch_xla pin to 20250730 ( #21956 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-31 17:26:43 +00:00
Alexei-V-Ivanov-AMD
0780bb5783
Removing amdproduction Tests ( #22027 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-07-31 09:53:27 -07:00
Doug Smith
58bb902186
fix(setup): improve precompiled wheel setup for Docker builds ( #22025 )
...
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-07-31 09:52:48 -07:00
Zhengxu Chen
7349d5268b
[ez] Remove a trailing space from compilation/decorators.py ( #22028 )
2025-07-31 09:46:07 -07:00
Song
9484641616
[Model] Add step3 vl ( #21998 )
...
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
2025-07-31 23:19:06 +08:00
amirkl94
207b750e19
[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend ( #21458 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-31 06:00:01 -07:00
Nick Hill
5daffe7cf6
[BugFix] Fix case where collective_rpc returns None ( #22006 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-31 12:51:37 +00:00
wang.yuqi
2836dd73f1
[Model][CI] Let more pooling models support v1 ( #21747 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-31 01:51:15 -07:00
Daniele
d2aab336ad
[CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES ( #21599 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-07-31 15:00:08 +08:00
Cyrus Leung
9532a6d563
[Deprecation] Remove deprecated args and methods ( #21907 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 23:46:38 -07:00
Ning Xie
3e36fcbee6
[Bugfix]: fix metadata file copy in test_sharded_state_loader ( #21830 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-31 06:22:11 +00:00
Michael Goin
055bd3978e
[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes ( #21973 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-31 11:45:29 +08:00
Jee Jee Li
0f7919fca0
[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels ( #21818 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-30 20:41:12 -07:00
Michael Goin
61445453df
[UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA ( #21966 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-30 20:40:34 -07:00
Sanchit Gandhi
ec02e536df
[Bugfix] Relax lang pin for voxtral ( #21833 )
...
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 20:38:52 -07:00
Michael Goin
9cb497bfa3
[Example] Add async_llm_streaming.py example for AsyncLLM streaming in python ( #21763 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-30 18:39:46 -06:00
Zebing Lin
ca9e2be3ed
[Core] Move EngineCoreRequest to Request conversion out of EngineCore ( #21627 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com>
2025-07-30 15:00:54 -07:00
Bram
601f856d56
[Bugfix] Fix None value handling in trace span creation for cancelled requests ( #20272 )
2025-07-30 14:44:02 -07:00
cascade
287f527f54
[Feature] Add async tensor parallelism for scaled mm ( #20155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-07-30 17:23:41 -04:00
Ming Yang
f12d9256b3
[Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation ( #21635 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-07-30 13:15:06 -07:00
Doug Smith
b9b753e7a7
For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted ( #21964 )
2025-07-30 13:04:40 -07:00
Nick Hill
56bd537dde
[Misc] Support more collective_rpc return types ( #21845 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-30 10:20:20 -07:00
wenxindongwork
8f0d516715
[TPU] Support Pathways in vLLM ( #21417 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com>
2025-07-30 10:02:12 -07:00
wxsm
f4135232b9
feat(distributed): add get_required_kvcache_layout class method to kv connector api ( #20433 )
...
Signed-off-by: wxsm <wxsms@foxmail.com>
2025-07-30 16:41:51 +00:00
Chenguang Zheng
4904e53c32
[Bugfix] SharedStorage Connector for V1 PD multimodal ( #21611 )
...
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
2025-07-30 09:18:37 -07:00
Cyrus Leung
004203e953
[CI/Build] Fix registry tests ( #21934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 09:10:41 -07:00
633WHU
5c765aec65
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types ( #21816 )
...
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
2025-07-30 08:54:44 -07:00
Yong Hoon Shin
ad510309ee
Override attention metadata for fast prefill in some KV sharing setups ( #21590 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-30 08:54:15 -07:00
Cyrus Leung
366f6b3a4d
[Bugfix] Fix multi-api server not working for text models ( #21933 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 08:42:05 -07:00
Isotr0py
6e599eebe8
[Bugfix] Fix OOM tests in initialization test ( #21921 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-30 07:35:47 -07:00
Harry Mellor
88edf5994c
[Docs] Reduce the size of the built docs ( #21920 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-30 07:35:08 -07:00
Po-Han Huang (NVIDIA)
ff08e51940
[NVIDIA] Fix Llama4 Scout FP4 functionality issues ( #21499 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-07-30 07:33:40 -07:00
Ruixiang Tan
8f4a1c9a04
[Misc] Improve code readability of KVCacheManager ( #21673 )
...
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
2025-07-30 07:20:43 -07:00
Harry Mellor
36ede45989
Reduce time wasted in GitHub Actions using concurrency ( #21919 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-30 07:18:02 -07:00
Cyrus Leung
0e40b26073
[CI/Build] Only run markdownlint in CI ( #21892 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-30 07:17:14 -07:00
Wentao Ye
0271c2ff2f
[Test] Add Benchmark and Unit Test for per_token_group_quant ( #21860 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-30 07:15:02 -07:00
youkaichao
e91d3c9cda
[misc] skip p2p check by default ( #21904 )
2025-07-30 22:05:04 +08:00