xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-21 08:37:11 +08:00

Author	SHA1	Message	Date
Johnny Yang	59012df99b	[TPU] update TPU benchmark threshold (#25713 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-10-07 13:53:09 -07:00
Benjamin Chislett	3d1f67616d	[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA (#25984 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-07 16:05:59 -04:00
Sergei Skvortsov	6ebaf43ee4	[V1] Logit processors for rejection sampler (#19482 ) Signed-off-by: southfreebird <yvorott@gmail.com> Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: Sergei Skvortsov <yvorott@gmail.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-07 13:02:49 -07:00
Morrison Turnansky	0c824fc46f	[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops (#26113 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2025-10-07 12:53:43 -07:00
Pei-Lun Liao	eb577e4655	[Bugfix] Add missing sink tensor into flash attn cascade attn implementation (#26325 )	2025-10-07 18:56:39 +00:00
Wentao Ye	8f36850f73	[Bug] Fix Shape Validation for Fallback while Enabling E8M0 for DeepGEMM (#26322 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-07 13:50:30 -04:00
Chen Zhang	29fd2662ba	[deepseek] add EP8 FusedMOE config for H200 and B200 (#26331 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-07 10:38:54 -07:00
Michael Goin	30a3e5af69	[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval (#26316 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-07 10:36:15 -07:00
fxmarty-amd	a38c1bfe09	[ci] Rename `test_mxfp4_moe.py` to `test_ocp_mx_moe.py` (#26364 ) Signed-off-by: Felix Marty <Felix.Marty@amd.com>	2025-10-07 09:52:24 -07:00
Paul Pak	320feae6f5	[Model] Lfm2Moe (#26344 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-10-07 16:03:05 +00:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Cyrus Leung	c0a7b89d8e	[Misc] Move `LRUCache` into its own file (#26342 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:08:40 +00:00
antrec	6f59beaf0b	[Model] Add support for ModernBertForTokenClassification (#26340 ) Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr> Signed-off-by: antrec <antoine.recanati@gmail.com> Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-07 14:29:19 +00:00
fxmarty-amd	41f1cf38f2	[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166 )	2025-10-07 09:35:26 -04:00
Isotr0py	08d26a1b7e	[Model] Use `merge_by_field_config` for MM models (Ovis family) (#26308 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-07 12:54:22 +00:00
fhl2000	63773a6200	[Docs] add docs for cuda graph v1 (#24374 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-07 05:25:05 -07:00
Sergio Paniego Blanco	883b42896a	Add TRL example notebook to RLHF docs (#26346 ) Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>	2025-10-07 11:31:28 +00:00
Daniel Cámpora	e1098ced95	Add topk logits torch op for DS3.2. (#25945 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-10-07 10:07:32 +00:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Cyrus Leung	7e4cd070b0	[V0 Deprecation] Remove `VLLM_USE_V1` from docs and scripts (#26336 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 16:46:44 +08:00
Snehlata	46b0779996	[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 (#26265 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-10-07 08:42:28 +00:00
Ayush Satyam	de342585ff	[Model] Define merge_by_field_config MM interface (R-T) (#26260 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 16:10:55 +08:00
Andrew Xia	185d8ed44f	[responsesAPI][bugfix] serialize harmony messages (#26185 ) Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-07 07:07:53 +00:00
Cyrus Leung	d9836d4517	[Deprecation] Deprecate `LLM.set_tokenizer` (#26333 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 06:50:57 +00:00
Ayush Satyam	5f7e8a916a	[Model] Define merge_by_field_config MM interface (U-Z) (#26261 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 06:45:49 +00:00
ahao-anyscale	4dbdf4a294	[BUG] Fix file parsing for load_format runai_streamer_sharded (#26324 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-10-07 11:23:07 +08:00
Michael Goin	c6873c4e6d	[UX] Support nested dicts in hf_overrides (#25727 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-07 11:19:16 +08:00
Sage Moore	2111b4643c	[Core] Simplify the Dp padding/should ubatch coordination logic (#25768 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-07 01:57:49 +00:00
Sage Moore	c50901f3b9	[Docs][DBO] Add initial doc that describes the DBO implementation (#26024 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-07 00:47:28 +00:00
Simon Mo	8229280a9c	[Misc] Define EP kernel arch list in Dockerfile (#25635 ) Signed-off-by: Simon Mo <simon.mo@hey.com>	2025-10-07 00:05:33 +00:00
Benjamin Chislett	f77df94647	[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 23:03:49 +00:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Benjamin Chislett	2161efe978	[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 16:16:30 -04:00
Varun Sundar Rabindranath	f23b4c04fd	[BugFix] Pad input buffers in _dummy_run (#26209 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-06 16:07:51 -04:00
Varun Sundar Rabindranath	93540958b8	[Docs] Fix broken table in moe_kernel_features doc (#26314 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-06 15:58:05 -04:00
Cyrus Leung	44b9af5bb2	[Benchmark] Enable MM Embedding benchmarks (#26310 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 19:51:58 +00:00
Raushan Turganbay	7cd95dc8a3	[Bugfix] Fix gemma3 with transformers backend (#23178 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Raushan Turganbay <raushan@huggingface.co> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 18:42:32 +00:00
Crefeda Rodrigues	c02058c222	Add bias handling to CPUFusedMOE kernel (#26289 ) Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Crefeda Rodrigues <65665931+cfRod@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Sharif Inamdar <Sharif.Inamdar@arm.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-10-06 18:39:10 +00:00
7mile	b2ea5ba677	[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231 ) Signed-off-by: seven-mile <i@7li.moe> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 18:24:22 +00:00
Karan Goel	824a3f403f	[Misc] auto_tune: kill specific vllm process (#26304 ) Signed-off-by: Karan Goel <karangoel@google.com>	2025-10-06 18:02:51 +00:00
Rahul Tuli	05f6846ede	Support llama3 eagle3 head with llama4 verifier (#25961 ) Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-06 13:56:08 -04:00
Michael Goin	20db99cc69	[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe (#26188 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 13:50:11 -04:00
Yannick Schnider	6431be808f	[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 17:19:34 +00:00
Matthew Bonanni	4727a8afa7	[Attention] Remove unused reorder_batch method (#24463 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-06 13:13:39 -04:00
tomeras91	b8f603cebe	[Model] EVS support for nano_nemotron_vl (#26269 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>	2025-10-07 00:23:37 +08:00
Chatcharin Sangbutsarakum	fc679696f8	Fix `DotsOCR` tensor type (#26281 ) Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com>	2025-10-06 12:23:43 +00:00
Raushan Turganbay	ab5e7d93f4	[Bugfix] Fix mrope in Transformers Backend (#26087 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:40:50 +00:00
Harry Mellor	0340f45553	Support expert parallel load balancing in Transformers backend (#26287 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:20:16 +00:00
Cyrus Leung	19a00eb210	[Model] Use `merge_by_field_config` for MM models (Llava family) (#26280 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:45:26 +00:00
Cyrus Leung	391612e78b	[Frontend] Consolidate tokenizer init code (#26276 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:34:52 +00:00

1 2 3 4 5 ...

10272 Commits