xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-29 05:07:13 +08:00

Author	SHA1	Message	Date
Harry Mellor	d76541a6c5	Stop mergify from keeping stale PRs alive (#26169 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-03 16:42:34 +00:00
Chendi.Xue	dd96465fd7	[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 (#26123 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-03 08:52:26 -07:00
Jun Jiang	4f8f47e87e	Fix undefined symbol: cutlass_moe_mm_sm100 (#26098 ) Signed-off-by: Jun Jiang <jasl9187@hotmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-03 15:48:32 +00:00
Cyrus Leung	d78fda7cda	[Renderer] Move Processor out of LLMEngine (#26165 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-03 15:08:22 +00:00
Aleksandr Samarin	73a99cc2a5	[Model] Fixed stream generator for gpt-oss + spec-decoding (#26027 ) Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>	2025-10-03 13:43:41 +00:00
Xiang Si	adae0c1f43	[CI/Build] do not enforce precompilation on tpu ci tests (#25992 ) Signed-off-by: Xiang Si <sixiang@google.com>	2025-10-03 13:38:42 +00:00
whx	cbf9221992	[Model] Supplement to PR 24862: Pass param prefix to LLMHead (#25805 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-10-03 21:34:53 +08:00
Paul Pak	5f42fc53b6	[backends][short_conv] CUDA graph piecewise edits (#24215 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-10-03 12:59:48 +00:00
Yannick Schnider	8ee846c27c	[Bugfix] Re-enable prefill of max model length (#24446 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-03 14:13:34 +02:00
Yang Liu	812b7f54a8	[Renderer] Move Processor out of AsyncLLM (#24138 ) Signed-off-by: Yang <lymailforjob@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-03 11:29:45 +00:00
Sage Moore	5f2cacdb1e	Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-03 11:28:22 +00:00
Egor	aa5053e3fe	[Doc] Fixed shape description for fused_batched_moe.py (#25668 ) Signed-off-by: Egor <e.a.krivov@gmail.com>	2025-10-03 04:00:23 -07:00
Wenlong Wang	79aa244678	[Multi Modal] Configurable MM Profiling (#25631 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-03 03:59:10 -07:00
kyt	2ed3f20dba	[openai] Fix missing tool usage check (system message) (#24768 ) Signed-off-by: kyt <eluban4532@gmail.com>	2025-10-03 18:55:44 +08:00
Nicolò Lucchesi	48f309029a	[NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-03 10:47:59 +00:00
Thomas Parnell	0e93ac0b3a	[CI] Fix distributed hybrid tests in CI (#26155 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-03 09:14:18 +00:00
Yannick Schnider	5446ad1d24	[test utils] correct wrong typing (#26159 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-03 02:11:49 -07:00
Cyrus Leung	f9a8084e48	[Model] Use `merge_by_field_config` for MM models (InternVL family) (#26153 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-03 01:59:06 -07:00
HUIJONG JEONG	3e70e3d4d5	add(v1): RequestStatesStats to RequestOutput (#24947 ) Signed-off-by: huijjj <huijong.jeong@squeezebits.com>	2025-10-03 08:56:25 +00:00
Jiangyun Zhu	eb0fa43868	[Perf] Optimize `reshape_and_cache` CUDA Kernel (#25955 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Liu-congo <1502632128@qq.com>	2025-10-03 01:33:46 -07:00
Cyrus Leung	0ad9951c41	[Input] Remove unused `prompt` field (#26097 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-03 00:23:21 -07:00
Varun Sundar Rabindranath	8c9117181d	[Misc] Remove typing.List (#26150 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-03 07:00:33 +00:00
ahao-anyscale	c4b48d3c0f	[BUG] Reorder model config creation (#26124 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-10-03 14:59:36 +08:00
Harry Mellor	10d765482d	`FusedMoE` support for the Transformers backend (#22650 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-02 23:12:15 -07:00
Cyrus Leung	39b643dc1a	[Model] Use `merge_by_field_config` for MM models (G) (#26117 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-02 22:38:29 -07:00
Zhewen Li	711f485643	[Bugfix] Fix import `gemm_afp4wfp4` failure on AMD (#26068 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-02 22:37:25 -07:00
TJian	9c5ee91b2a	[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-02 22:34:53 -07:00
Tyler Michael Smith	27edd2aeb4	[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-10-02 22:21:01 -07:00
Andrew Xia	e5017cd6d6	[gpt-oss] disable tool server initialization if no tool in request (#25790 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-10-03 05:08:35 +00:00
Benjamin Chislett	6a7796e871	[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small (#26144 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-03 04:00:20 +00:00
Matthew Bonanni	47b9339546	[DeepSeek] Improve performance of DS MLA cache kernel (#26132 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:35:47 -07:00
Michael Goin	5d5146eee3	[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-02 20:32:38 -07:00
Matthew Bonanni	2aaa423842	[Attention] Move Backend enum into registry (#25893 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:32:24 -07:00
Ekagra Ranjan	ad2d788016	[Bug][Benchmark] Fix duplicate req in oversampling (#26140 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-03 02:55:24 +00:00
Wentao Ye	36ce76c632	[Log] Optimize DeepGEMM Missing Log (#26106 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-02 20:02:26 -06:00
Michael Goin	f1fc2107a3	[Bugfix] Disable cascade attention with FlashInfer (#26130 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-02 16:30:37 -07:00
Matthew Bonanni	13cdc02173	Fix MTP with deepep_low_latency (#25904 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 21:29:49 +00:00
ElizaWszola	502640c3f9	[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-10-02 19:35:13 +00:00
Chen Zhang	3d5f1c8640	[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP (#25119 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-02 18:48:31 +00:00
Ekagra Ranjan	1cab2f9cad	EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench (#25916 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-10-02 11:29:35 -07:00
Chen Zhang	1e50f1be70	[Deepseek v3.2] Support indexer prefill chunking (#25999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-02 10:29:12 -07:00
Chenheli Hua	ad87ba927a	[Small] Prevent bypassing media domain restriction via HTTP redirects (#26035 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-10-02 10:27:10 -07:00
Lucas Wilkinson	decf7f794b	[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-02 17:18:13 +00:00
Cyrus Leung	d00d652998	[CI/Build] Replace `vllm.entrypoints.openai.api_server` entrypoint with `vllm serve` command (#25967 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-02 10:04:57 -07:00
Michael Goin	3b279a84be	[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests (#26040 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-02 09:07:19 -07:00
vllmellm	5e4a8223c6	[Qwen][ROCm] Flash Attention Rotary Embeddings (#24642 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-10-02 08:26:08 -07:00
leo-pony	e51de388a2	[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU (#25470 ) Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-10-02 23:19:22 +08:00
Cyrus Leung	cc253b73d3	[Model] Use `merge_by_field_config` for MM models (D-F) (#26076 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-02 08:17:35 -07:00
Cyrus Leung	7d6fb905d9	[Model] Use `merge_by_field_config` for MM models (A-C) (#26073 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-02 08:17:31 -07:00
Lucas Wilkinson	418d111f8c	[FA/Chore] Bump vllm-flash-attention (#25537 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-02 11:06:14 -04:00

1 2 3 4 5 ...

10108 Commits