3537 Commits

Author SHA1 Message Date
Asher
5a7fb3ab9e
[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-17 09:10:09 +00:00
Varun Sundar Rabindranath
11dfdf21bf
[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-17 08:10:37 +00:00
Chauncey
fdc5b43d20
[Bugfix]: Fix final_res_batch list index out of range error (#21055)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-17 00:29:09 -07:00
David Ben-David
4fcef49ec4
[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-07-17 13:29:45 +08:00
Lucas Wilkinson
76b494444f
[Attention] Refactor attention metadata builder interface (#20466)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-17 04:44:25 +00:00
Michael Goin
4e7dfbe7b4
Update PyTorch to torch==2.7.1 for CUDA (#21011)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
Mac Misiura
18bdcf4113
feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information (#20575)
Signed-off-by: m-misiura <mmisiura@redhat.com>
2025-07-16 21:52:14 +08:00
Seiji Eicher
d0dc4cfca4
Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests (#20831)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-16 00:14:49 -07:00
zhiweiz
c11013db8b
[Meta] Llama4 EAGLE Support (#20591)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: qizixi <qizixi@meta.com>
2025-07-15 21:14:15 -07:00
Peter Pan
1eb2b9c102
[CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
Maximilien de Bayser
6ebf313790
Avoid direct comparison of floating point numbers (#21002)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-07-15 21:12:14 -07:00
Patrick von Platen
cfbcb9ed87
[Voxtral] Add more tests (#21010)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 21:11:49 -07:00
Chauncey
34cda778a0
[Frontend] OpenAI Responses API supports input image (#20975)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 18:59:36 -06:00
Harry Mellor
1e36c8687e
[Deprecation] Remove nullable_kvs (#20969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:21:50 +00:00
Patrick von Platen
e7e3e6d263
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
Harry Mellor
56fe4bedd6
[Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 14:00:50 +00:00
Thomas Parnell
3534c39a20
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
Ilya Markov
37a7d5d74a
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-15 06:57:40 +00:00
Woosuk Kwon
d4d309409f
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-14 23:01:46 -07:00
Nicolò Lucchesi
80305c1b24
[CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:15:15 -07:00
XiongfeiWei
d4170fad39
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
Richard Zou
ba8c300018
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-15 01:26:18 +00:00
Nicolò Lucchesi
149f2435a5
[Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:08:36 +00:00
ant-yy
38efa28278
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
2025-07-14 22:10:32 +08:00
Cyrus Leung
dcf2a5e208
[CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-14 10:32:35 +00:00
wangxiyuan
1e9438e0b0
[MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-14 09:40:00 +00:00
Maroon Ayoub
66f6fbd393
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
22quinn
8632e831ba
[Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 00:49:18 +00:00
Minkyu Kim
bd4c1e6fdb
Support for LlamaForSequenceClassification (#20807)
Signed-off-by: thechaos16 <thechaos16@gmail.com>
2025-07-13 00:09:34 -07:00
QiliangCui
99b4f080d8
Renable google/gemma-3-1b-it accuracy test. (#20866)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-12 21:48:56 -07:00
Nicolò Lucchesi
020f58abcd
[Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-12 19:40:11 -07:00
Wentao Ye
c1acd6d7d4
[Refactor] Change the way of import triton (#20774)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-12 19:39:55 -07:00
Wentao Ye
42d440c22b
[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-12 19:38:45 -07:00
Woosuk Kwon
f45a332886
[Sched] Enhance the logic to remove stopped requests from queues (#20739) 2025-07-12 15:33:13 -07:00
Congcong Chen
2c11a738b3
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
2025-07-12 06:02:10 -07:00
Isotr0py
147afb448b
[Bugfix] Replace unavailable video url in multimodal test (#20854)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-12 05:25:39 +00:00
Isotr0py
01cae37713
[CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-11 20:53:07 -07:00
yurhett
11c0198615
[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-07-11 20:52:43 -07:00
Trevor Morris
a8593237c0
Add pynccl all-gatherv and reducescatterv (#20154)
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 18:59:23 -07:00
Ilya Markov
fc0f41d10a
Integration SM100 FlashInfer fused allreduce RMSNorm (#20691)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-11 18:58:15 -07:00
bigmoyan
5f0af36af5
Update kimi-k2 tool calling docs, enable unit tests (#20821)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
2025-07-11 20:16:14 +00:00
Isotr0py
0d21b2664c
[Bugfix] Fix OOM in language generation test (#20814)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-11 11:21:52 -07:00
Varun Sundar Rabindranath
53fa457391
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-11 07:51:46 -07:00
QiliangCui
b4f0b5f9aa
Temporarily suspend google/gemma-3-1b-it. (#20722)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-11 11:21:26 +00:00
Cyrus Leung
cbd14ed561
[Bugfix] Refactor /invocations to be task-agnostic (#20764)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-11 03:20:54 -07:00
Pavani Majety
7bd4c37ae7
[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: shuw <shuw@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 09:23:23 +00:00
Jee Jee Li
8020e98c9f
[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-11 08:01:13 +00:00
Luka Govedič
762be26a8e
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
2025-07-11 00:15:22 -07:00
Luka Govedič
31d5c1797f
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 04:56:28 +00:00
Wentao Ye
e2de455c34
[Feature] Integrate SM100 DeepGEMM support (#20087) 2025-07-10 20:18:05 -07:00