Cyrus Leung
82de9b9d46
[Misc] Automatically resolve HF processor init kwargs ( #22005 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-31 22:44:10 -07:00
Charent
ad57f23f6a
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache ( #20873 )
...
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
2025-07-31 19:48:13 -07:00
Wentao Ye
3700642013
[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM ( #21787 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-01 01:13:27 +00:00
Matthew Bonanni
e360316ab9
Add DeepGEMM to Dockerfile in vllm-base image ( #21533 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-31 18:01:55 -07:00
Ilya Markov
6e672daf62
Add FlashInfer allreduce RMSNorm Quant fusion ( #21069 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-31 13:58:38 -07:00
Yong Hoon Shin
71470bc4af
[Misc] Add unit tests for chunked local attention ( #21692 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-31 11:39:16 -07:00
zhiweiz
9e0726e5bf
[Meta] Official Eagle mm support, first enablement on llama4 ( #20788 )
...
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-07-31 10:35:07 -07:00
Song
9484641616
[Model] Add step3 vl ( #21998 )
...
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
2025-07-31 23:19:06 +08:00
Nick Hill
5daffe7cf6
[BugFix] Fix case where collective_rpc returns None ( #22006 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-31 12:51:37 +00:00
wang.yuqi
2836dd73f1
[Model][CI] Let more pooling models support v1 ( #21747 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-31 01:51:15 -07:00
Ning Xie
3e36fcbee6
[Bugfix]: fix metadata file copy in test_sharded_state_loader ( #21830 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-31 06:22:11 +00:00
Michael Goin
055bd3978e
[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes ( #21973 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-31 11:45:29 +08:00
Zebing Lin
ca9e2be3ed
[Core] Move EngineCoreRequest to Request conversion out of EngineCore ( #21627 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com>
2025-07-30 15:00:54 -07:00
cascade
287f527f54
[Feature] Add async tensor parallelism for scaled mm ( #20155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-07-30 17:23:41 -04:00
Nick Hill
56bd537dde
[Misc] Support more collective_rpc return types ( #21845 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-30 10:20:20 -07:00
wxsm
f4135232b9
feat(distributed): add get_required_kvcache_layout class method to kv connector api ( #20433 )
...
Signed-off-by: wxsm <wxsms@foxmail.com>
2025-07-30 16:41:51 +00:00
Chenguang Zheng
4904e53c32
[Bugfix] SharedStorage Connector for V1 PD multimodal ( #21611 )
...
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
2025-07-30 09:18:37 -07:00
Cyrus Leung
004203e953
[CI/Build] Fix registry tests ( #21934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 09:10:41 -07:00
633WHU
5c765aec65
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types ( #21816 )
...
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
2025-07-30 08:54:44 -07:00
Yong Hoon Shin
ad510309ee
Override attention metadata for fast prefill in some KV sharing setups ( #21590 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-30 08:54:15 -07:00
Isotr0py
6e599eebe8
[Bugfix] Fix OOM tests in initialization test ( #21921 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-30 07:35:47 -07:00
Ruixiang Tan
8f4a1c9a04
[Misc] Improve code readability of KVCacheManager ( #21673 )
...
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
2025-07-30 07:20:43 -07:00
Wentao Ye
0271c2ff2f
[Test] Add Benchmark and Unit Test for per_token_group_quant ( #21860 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-30 07:15:02 -07:00
Varun Vinayak Shenoy
547795232d
[Tests] Fixing bug inside MultiModalProfiler. ( #21842 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
2025-07-30 00:44:15 -07:00
wang.yuqi
65f311ce59
[Frontend] Add LLM.reward specific to reward models ( #21720 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-29 20:56:03 -07:00
Chen Zhang
555e7225bc
[v1][attention] Support Hybrid Allocator + FlashInfer ( #21412 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-07-30 01:45:29 +00:00
elvischenv
58b11b24a6
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend ( #21525 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-29 10:34:00 -04:00
Richard Zou
04e38500ee
[Bugfix] VLLM_V1 supports passing other compilation levels ( #19340 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-29 09:35:58 -04:00
Chen Zhang
755fa8b657
[KVCache] Make KVCacheSpec hashable ( #21791 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-07-29 19:58:29 +08:00
Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding ( #21347 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
lyrisz
c6c9122d50
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning ( #20396 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
2025-07-28 23:13:58 +00:00
Kuntai Du
b18b417fbf
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" ( #21778 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-07-28 20:15:18 +00:00
Harry Mellor
94b71ae106
Use metavar to list the choices for a CLI arg when custom values are also accepted ( #21760 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-28 19:31:10 +00:00
Cyrus Leung
04fe61aa3d
[CI/Build] Fix plugin tests ( #21758 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 15:08:05 +00:00
Keyang Ru
9ace2eaf35
[Bugfix] Improve JSON extraction in LlamaToolParser ( #19024 )
...
Signed-off-by: keru <keyang.ru@oracle.com>
Co-authored-by: keru <keyang.ru@oracle.com>
2025-07-28 12:36:58 +00:00
Anton Vlasjuk
656c24f1b5
[Ernie 4.5] Name Change for Base 0.3B Model ( #21735 )
...
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 12:22:32 +00:00
Cyrus Leung
a4ed731546
[Model] Prioritize Transformers fallback over suffix matching ( #21719 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 02:15:31 -07:00
Asaf Joseph Gardin
a6c050286a
[v1][mamba] Added mamba_type into MambaSpec ( #21715 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-07-28 08:15:55 +00:00
Hongsheng Liu
7656cf4cf3
[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled ( #21573 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
2025-07-27 22:43:50 -07:00
Benji Beck
88e46c7c8d
Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema ( #21678 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-07-27 22:36:08 -07:00
Shinichi Hemmi
c7ffe93d9c
[Model] Support TP/PP/mamba2 kernel for PLaMo2 ( #19674 )
...
Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp>
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
Co-authored-by: Sixue Wang <cecilwang@preferred.jp>
2025-07-28 05:00:47 +00:00
Adeline
15a72ac478
[V1] Exception Handling when Loading KV Cache from Remote Store ( #21534 )
...
Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
2025-07-27 20:34:17 -07:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Caleb_Du
57c22e57f9
Fix CUDA permute/unpermute for use with DeepGemm Moe ( #17934 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
2025-07-27 07:08:00 -07:00
Wentao Ye
bda9d0535f
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor ( #21631 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-27 05:25:21 -07:00
Isotr0py
3d847a3125
[VLM] Add video support for Intern-S1 ( #21671 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-27 11:49:43 +00:00
Isotr0py
eed2f463b2
[VLM] Support HF format Phi-4-MM model ( #17121 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-26 20:07:57 -07:00
Benji Beck
3339cba3ff
Migrate FuyuImagePatchInputs to TensorSchema ( #21662 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-26 19:34:14 -07:00
Maximilien de Bayser
1cd6eaba54
Support encoder-only models without KV-Cache ( #21270 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-07-26 21:09:52 +08:00
Isotr0py
f27fdfc3ed
[Bugfix] Investigate Qwen2-VL failing test ( #21527 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-26 06:09:29 -07:00