Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
Michael Goin
|
8bb43b9c9e
|
Add benchmark dataset for mlperf llama tasks (#20338)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-14 19:10:07 +00:00 |
|
Tyler Michael Smith
|
559756214b
|
Change default model to Qwen3-0.6B (#20335)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-14 16:54:52 +00:00 |
|
Isotr0py
|
6d0cf239c6
|
[CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 16:33:17 +00:00 |
|
Isotr0py
|
3fc964433a
|
[Misc] Clean up Aimv2 config registration in Ovis config (#20921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 15:36:43 +00:00 |
|
Lu Fang
|
0caf61c08a
|
[CI] Update codeowner for compilation code (#20929)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-14 08:33:19 -07:00 |
|
Richard Zou
|
667624659b
|
[CI] cc folks on changes to vllm/compilation (#20925)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-14 07:52:17 -07:00 |
|
ant-yy
|
38efa28278
|
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
|
2025-07-14 22:10:32 +08:00 |
|
Cyrus Leung
|
e8cc53af5e
|
[Misc] Log the reason for falling back to FlexAttention (#20699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 04:16:51 -07:00 |
|
Chauncey
|
a4851cfe68
|
[Bugfix]: Fix messy code when using logprobs (#20910)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-14 11:06:45 +00:00 |
|
Reid
|
9887e8ec50
|
[Misc] Remove unused function (#20909)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 10:48:55 +00:00 |
|
22quinn
|
f326ab9c88
|
[Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-14 10:45:03 +00:00 |
|
Cyrus Leung
|
dcf2a5e208
|
[CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 10:32:35 +00:00 |
|
wangxiyuan
|
1e9438e0b0
|
[MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-07-14 09:40:00 +00:00 |
|
Aaron Pham
|
697ef765ee
|
[Refactor][V1] Move outlines utils for V1 imports (#20878)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-07-14 00:58:35 -07:00 |
|
Jee Jee Li
|
a99b9f7dee
|
[Quantization] add BNB for MixtralForCausalLM (#20893)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-14 07:34:34 +00:00 |
|
TJian
|
c488b928a7
|
[ROCm] [Bugfix] [Critical]: Fix mamba compilation bug (#20883)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-07-14 15:23:28 +08:00 |
|
Reid
|
2c7fa47161
|
Fix: Add missing EOFError handling in CLI complete command (#20896)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 07:09:57 +00:00 |
|
Daniel song
|
88fc8a97e3
|
Removing redundant python version check (#20888)
Signed-off-by: Dannyso05 <dansong1177@gmail.com>
|
2025-07-14 06:15:05 +00:00 |
|
Maroon Ayoub
|
66f6fbd393
|
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
|
2025-07-14 02:45:31 +00:00 |
|
22quinn
|
8632e831ba
|
[Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-14 00:49:18 +00:00 |
|
nopperl
|
4bbfc36b16
|
[V1] Hybrid allocator without prefix caching (#20661)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-07-13 16:55:14 +00:00 |
|
TJian
|
80d38b8ac8
|
[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs (#20880)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-07-13 15:19:32 +00:00 |
|
Liuchenlong
|
211b6a6113
|
[Bugfix] fix define of RerankDocument (#20877)
Signed-off-by: liuchenlong <liuchenlong@xiaohongshu.com>
Co-authored-by: liuchenlong <liuchenlong@xiaohongshu.com>
|
2025-07-13 14:32:40 +00:00 |
|
Wang Siyuan
|
247102f07f
|
[Bugfix] Fix: add patch_rope_scaling after hf override (#20857)
Signed-off-by: Wang Siyuan <wsy0227@sjtu.edu.cn>
Signed-off-by: Wang Siyuan <sywang0227@gmail.com>
|
2025-07-13 00:13:25 -07:00 |
|
Minkyu Kim
|
bd4c1e6fdb
|
Support for LlamaForSequenceClassification (#20807)
Signed-off-by: thechaos16 <thechaos16@gmail.com>
|
2025-07-13 00:09:34 -07:00 |
|
QiliangCui
|
99b4f080d8
|
Renable google/gemma-3-1b-it accuracy test. (#20866)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-12 21:48:56 -07:00 |
|
Nicolò Lucchesi
|
020f58abcd
|
[Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-12 19:40:11 -07:00 |
|
Wentao Ye
|
c1acd6d7d4
|
[Refactor] Change the way of import triton (#20774)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-12 19:39:55 -07:00 |
|
ElizaWszola
|
3b3b778d4a
|
[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs (#20825)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-12 19:39:14 -07:00 |
|
Wentao Ye
|
42d440c22b
|
[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-12 19:38:45 -07:00 |
|
Woosuk Kwon
|
f45a332886
|
[Sched] Enhance the logic to remove stopped requests from queues (#20739)
|
2025-07-12 15:33:13 -07:00 |
|
Michael Goin
|
6e2c176e1f
|
[Bugfix] Restrict Machete to only run on Hopper (#20830)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-12 17:34:40 +00:00 |
|
Reid
|
a86754a12b
|
[docs] convert supported configs to table (#20858)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-12 06:54:50 -07:00 |
|
Alex Brooks
|
c2a2f19aba
|
[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-07-12 06:11:30 -07:00 |
|
Congcong Chen
|
2c11a738b3
|
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
|
2025-07-12 06:02:10 -07:00 |
|
Michael Goin
|
b639327ad9
|
Revert "Use NVCC --compress-mode to reduce binary size by 30% #20694" (#20853)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 23:07:35 -07:00 |
|
Zhiyu
|
4afe687a82
|
Enable ModelOpt Llama4 fp8 checkpoint deployment (#20419)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-07-11 23:07:16 -07:00 |
|
Maximilien de Bayser
|
5de8d9f111
|
Remove extra tensor on CPU (#20693)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-07-12 14:06:34 +08:00 |
|
Boyuan Feng
|
c1c8ca57ff
|
[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile (#20790)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-11 23:06:13 -07:00 |
|
Richard Zou
|
a3a5a47e48
|
[Bugfix] Fix torch.compile x LoRA for PyTorch 2.8 (#20823)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-07-11 23:06:04 -07:00 |
|
Lucia Fang
|
fb25e95688
|
[Docs] Update basic.md (#20846)
|
2025-07-11 23:05:32 -07:00 |
|
Wentao Ye
|
0d4891cd03
|
[Bug] Fix DeepGemm for EP low latency case (#20833)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-11 23:05:12 -07:00 |
|
lkchen
|
f56d2996ca
|
[Misc] Respect no_use_tqdm_on_load flag while capturing CUDA graph (#20834)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-07-11 23:04:45 -07:00 |
|
Isotr0py
|
147afb448b
|
[Bugfix] Replace unavailable video url in multimodal test (#20854)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-12 05:25:39 +00:00 |
|
Nicolò Lucchesi
|
3c7d942da8
|
[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-11 21:33:26 -07:00 |
|
Varun Sundar Rabindranath
|
890323dc1b
|
[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once (#20852)
|
2025-07-11 20:56:24 -07:00 |
|
Isotr0py
|
01cae37713
|
[CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-11 20:53:07 -07:00 |
|
yurhett
|
11c0198615
|
[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-07-11 20:52:43 -07:00 |
|
Li, Jiang
|
b1235c3e10
|
[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-11 20:52:05 -07:00 |
|