Wentao Ye
|
eefbf4a68b
|
[Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 19:18:51 -04:00 |
|
Michael Goin
|
88faa466d7
|
[CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 16:18:38 -07:00 |
|
Nick Hill
|
881e1af43a
|
[BugFix] Harden distributed DP startup (#21538)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 21:40:45 +00:00 |
|
XiongfeiWei
|
d84b97a3e3
|
Add lora test for tp>1 case for TPU. (#21970)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-08-01 18:56:08 +00:00 |
|
Rui Qiao
|
d331759488
|
Introduce RayPPCommunicator for ray-based PP (#21660)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-01 11:50:58 -07:00 |
|
Animesh Jain
|
9659bc7f27
|
[compile][startup] Disable C++ compilation of symbolic shapes (#20836)
Signed-off-by: Animesh Jain <anijain@umich.edu>
|
2025-08-01 10:38:52 -07:00 |
|
Michael Goin
|
3277e8f9e1
|
Fix pre-commit failure for SECURTIY.md (#22102)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 10:36:07 -07:00 |
|
Jee Jee Li
|
8d705996df
|
[Misc] Minor enhancement of benchmark_moe (#22068)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-02 01:35:30 +08:00 |
|
Harry Mellor
|
38c8bce8b6
|
Enable headless models for pooling in the Transformers backend (#21767)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 10:31:29 -07:00 |
|
Varun Sundar Rabindranath
|
ac45c44d98
|
[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-01 10:14:38 -07:00 |
|
Huzaifa Sidhpurwala
|
d6664664b4
|
security policy: take 1 (#21119)
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-01 10:09:49 -07:00 |
|
rongfu.leng
|
b879ecd6e2
|
[Bugfix] fix when skip tokenizer init (#21922)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-01 10:09:36 -07:00 |
|
Isotr0py
|
3f8e952179
|
[Bugfix] Fix glm4.1v video inference issue (#22067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-01 09:33:30 -07:00 |
|
Harry Mellor
|
326a1b001d
|
Improve documentation of ModelConfig.try_get_generation_config to prevent future confusion (#21526)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 09:32:27 -07:00 |
|
Harry Mellor
|
2d7b09b998
|
Deprecate --disable-log-requests and replace with --enable-log-requests (#21739)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 17:16:37 +01:00 |
|
David Xia
|
97608dc276
|
[Docs] use uv in CPU installation docs (#22089)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-08-01 07:55:55 -07:00 |
|
Nick Hill
|
3146519add
|
[BugFix] Don't change title of top-level process (#22032)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 07:37:55 -07:00 |
|
Richard Zou
|
8026a335a1
|
[BugFix] Update AttnFusionPass cache key (#21947)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-08-01 07:11:29 -07:00 |
|
Wentao Ye
|
a59cd9d9f7
|
[Refactor] Fix Compile Warning #1444-D (#21462)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 06:10:30 -07:00 |
|
Abirdcfly
|
5c54d9759d
|
[Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-01 06:08:45 -07:00 |
|
Gamhang
|
0a6d305e0f
|
feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052)
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Co-authored-by: Jinheng Li <ahengljh@gmail.com>
|
2025-08-01 06:07:33 -07:00 |
|
Michael Goin
|
f81c1bb055
|
[Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels (#21893)
|
2025-08-01 08:28:45 -04:00 |
|
Harry Mellor
|
fb0e0d46fc
|
Fix get_kwargs for case where type hint is list[Union[str, type]] (#22016)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 05:26:42 -07:00 |
|
TJian
|
26b5f7bd2a
|
[BUG] [ROCm] Fix import bug on ROCm (#22083)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-01 05:25:20 -07:00 |
|
Dipika Sikka
|
dfbc1f8880
|
[Speculative Decoding] Add speculators config support (#21345)
|
2025-08-01 08:25:18 -04:00 |
|
Harry Mellor
|
87c94bc879
|
Revert "Update sampling_metadata.py (#21937)" (#22088)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 05:24:46 -07:00 |
|
Jee Jee Li
|
28b18cc741
|
[Quantization] Enable BNB support for InternS1 (#21953)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-01 11:09:54 +00:00 |
|
WeiQing Chen
|
4931486988
|
[Doc] Added warning of speculating with draft model (#22047)
Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
|
2025-08-01 02:11:56 -07:00 |
|
Woosuk Kwon
|
0f81b310db
|
[Misc] Remove upper bound in openai package version (#22060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-01 02:11:40 -07:00 |
|
wuhang
|
e6680f9e25
|
[Bugfix] Add log prefix in non-dp mode engine core (#21889)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-08-01 09:04:16 +00:00 |
|
Roger Wang
|
27a145e893
|
[Doc] Add example for Step3-VL (#22061)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-01 08:35:49 +00:00 |
|
Simon Mo
|
da31f6ad3d
|
Revert precompile wheel changes (#22055)
|
2025-08-01 08:26:24 +00:00 |
|
Sungyoon Jeong
|
98df153abf
|
[Frontend] Align tool_choice="required" behavior with OpenAI when tools is empty (#21052)
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai>
|
2025-08-01 07:54:17 +00:00 |
|
Zebing Lin
|
e0f63e4a35
|
[Core] Avoid repeated len(block_token_ids) check in hash_request_tokens (#21781)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-01 00:23:29 -07:00 |
|
Cyrus Leung
|
b4e081cb15
|
[Bugfix] Disable multi-modal preprocessor cache for DP (#21896)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-01 08:03:56 +01:00 |
|
Hongsheng Liu
|
79731a79f0
|
[Doc] Fix a syntax error of example code in structured_outputs.md (#22045)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
|
2025-08-01 00:01:22 -07:00 |
|
Aviad Rossmann
|
53d7c39271
|
Update sampling_metadata.py (#21937)
Signed-off-by: Aviad Rossmann <aviadr@neureality.ai>
|
2025-07-31 23:23:18 -07:00 |
|
Cyrus Leung
|
61dcc280fa
|
[Doc] Add Voxtral to Supported Models page (#22059)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-31 23:10:56 -07:00 |
|
Kyle Sayers
|
0f46a780d4
|
[Model] [Quantization] Support quantization for Gemma3n (#21974)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-07-31 22:45:15 -07:00 |
|
Mickaël Seznec
|
e1a7fe4af5
|
[BugFix] fix: aot passes kvcache dtype information (#19750)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-08-01 05:45:02 +00:00 |
|
Cyrus Leung
|
82de9b9d46
|
[Misc] Automatically resolve HF processor init kwargs (#22005)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-31 22:44:10 -07:00 |
|
Charent
|
ad57f23f6a
|
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873)
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
|
2025-07-31 19:48:13 -07:00 |
|
Wentao Ye
|
3700642013
|
[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM (#21787)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 01:13:27 +00:00 |
|
Michael Goin
|
0bd409cf01
|
Move flashinfer-python to optional extra vllm[flashinfer] (#21959)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 18:02:11 -07:00 |
|
Matthew Bonanni
|
e360316ab9
|
Add DeepGEMM to Dockerfile in vllm-base image (#21533)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 18:01:55 -07:00 |
|
Wentao Ye
|
c3e0e9337e
|
[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 (#21639)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-31 15:26:11 -07:00 |
|
Ilya Markov
|
6e672daf62
|
Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-31 13:58:38 -07:00 |
|
Benjamin Chislett
|
2dff2e21d9
|
[Bugfix] Fix MTP weight loading (#21941)
|
2025-07-31 16:33:53 -04:00 |
|
Yong Hoon Shin
|
71470bc4af
|
[Misc] Add unit tests for chunked local attention (#21692)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-31 11:39:16 -07:00 |
|
zhiweiz
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|