xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 16:27:18 +08:00

Author	SHA1	Message	Date
Wentao Ye	846197f505	[Log] Optimize kv cache memory log from Bytes to GiB (#25204 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-23 12:44:37 -04:00
rivos-shreeasish	2357480b1a	[BugFix] Fix UB in per_token_group_quant.cu (#24913 ) Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com>	2025-09-23 09:14:22 -07:00
bnellnm	f11e3c516b	[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 16:11:34 +00:00
Harry Mellor	875d6def90	Add backward compatibility for `GuidedDecodingParams` (#25422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-23 17:07:30 +01:00
Lucas Wilkinson	cc1dc7ed6d	[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-23 16:02:10 +00:00
Thomas Parnell	a903669e10	[V1] Remove V0 code paths for Hybrid models (#25400 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 08:26:13 -07:00
Michael Goin	2c58742dff	[UX] Change kv-cache-memory log level to debug (#25479 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 08:01:24 -07:00
Fanli Lin	4c966e440e	[XPU] Fix MOE DP accuracy issue on XPU (#25465 )	2025-09-23 14:32:57 +00:00
Peter Pan	da5e7e4329	[Docs] NixlConnector quickstart guide (#24249 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-09-23 14:23:22 +00:00
Chauncey	f05a4f0e34	[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-09-23 16:08:02 +02:00
Joel	61d1b35561	[BugFix] Register expert_map as named buffer for wake_up and sleep (#25458 ) Signed-off-by: wuxibin <wuxibin@bytedance.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-23 21:49:13 +08:00
Isotr0py	b6a136b58c	[CI/Build] Fix disabled v1 attention backend selection test (#25471 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 13:05:46 +00:00
vllmellm	0d9fe260dd	[docs] Benchmark Serving Incorrect Arg (#25474 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-23 06:05:11 -07:00
Jee Jee Li	273690a50a	[Core] Optimize LoRA weight loading (#25403 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-23 18:19:45 +08:00
Isotr0py	231c2c63e4	[Bugfix] Fix idefics3 `tie_word_embeddings` (#25454 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 10:06:48 +00:00
Andreas Hartel	4322c553a6	[Test]: Hermes tool parser stream output error in Qwen3 case (#25203 ) Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>	2025-09-23 17:56:31 +08:00
Cyrus Leung	babad6e5dd	[Misc] Move DP for ViT code inside model executor dir (#25459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-23 09:20:52 +00:00
Zhikaiiii	9383cd6f10	[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028 ) Signed-off-by: Zhikaiiii <1658973216@qq.com>	2025-09-23 16:07:27 +08:00
Ming Yang	ba8d2165b6	Handle triton kernel import exception (#25319 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-09-23 00:56:00 -07:00
Cyrus Leung	c98be0a232	[Model] Enable DP for ViT in Qwen2-VL (#25445 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-23 05:17:10 +00:00
Chendi.Xue	5774b0a1da	[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>	2025-09-23 04:17:42 +00:00
Varun Sundar Rabindranath	e8db44f883	[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-09-22 21:01:09 -07:00
Michael Yao	fafbe11af4	[Docs] Fix griffe warnings in vllm/lora/ops (#25369 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-23 03:42:58 +00:00
Michael Goin	78237e43bf	[Bugfix] Remove contiguous output req for context parallel MLA (#25414 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-22 20:26:32 -07:00
Lucia Fang	eea1783989	[benchmarks]allow skip ready check for bench serve (#25420 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-09-23 03:21:48 +00:00
Kunshang Ji	f225ea7dd9	[XPU] Fix `compile_size` is `None` case. (#25433 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-23 03:09:00 +00:00
JJJYmmm	fc97733da8	[feat] Support MRoPE + YaRN (#25384 ) Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com> Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>	2025-09-23 03:04:47 +00:00
Wentao Ye	4741239db7	[Bug] Fix Long Context OOM Issue (#25290 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-22 22:04:15 -04:00
Isotr0py	c625f9043c	[V0 deprecation] Remove `_set_default_args_v0` function (#25409 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 01:52:09 +00:00
Isotr0py	6fa78d8f23	[V0 deprecation] Remove platform v1 controling interface (#25410 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 01:48:12 +00:00
Wentao Ye	9949aa2ef1	[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-22 19:42:45 -06:00
Alexander Matveev	0b7bed9c38	[Performance] Remove input pads in cutlass_mla and optimize v_proj output handling (#25184 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-09-22 19:20:53 -06:00
Matthew Bonanni	ac0048c0ae	[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Chris Bamford <chrisbam4d@gmail.com>	2025-09-22 17:26:17 -07:00
Nicolò Lucchesi	090197034f	[Bugfix] Fix missing `clear_connector_metadata` (#25397 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-23 08:10:59 +08:00
Russell Bryant	f31ff87460	[Core] Drop overly aggressive whisper assertion (#25408 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-22 17:09:52 -07:00
Luka Govedič	d588cd2406	[Bugfix] fix custom op test (#25429 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 00:07:43 +00:00
Alec S	45d7d852d3	[Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-22 23:38:19 +00:00
Johnny Yang	8bed179109	[TPU] update torch_xla dependency for PyPI compatibility (#25278 ) Signed-off-by: Johnny Yang <johnnyyang@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-09-22 16:14:44 -07:00
Cyrus Leung	f552d5e578	[CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-22 13:18:24 -07:00
Or Ozeri	8db2939289	[KV offload][5/N] Add `CPUOffloadingSpec` (#24251 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-22 12:30:36 -07:00
Luka Govedič	d5e0fca264	[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091 ), fix test (#24376 ), and prep for custom op matching (#24604 ) (#24542 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-22 12:30:05 -07:00
Simon Mo	8d0ee5a564	[misc] Remove RFC review hours reference (#25416 )	2025-09-22 12:16:59 -07:00
Lucia Fang	922979bfcc	[DP] support torchrun external launcher with Data Parallelism (#24899 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-22 12:06:05 -07:00
Michael Goin	239ef0c1ac	[CI Failure] Fix fp8 kv cache on <SM90 (#25396 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-22 18:27:51 +00:00
ElizaWszola	1d7f95b85c	[Compiler] Disable Inductor standalone compile by default (#25391 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-09-22 17:37:46 +00:00
Daisy-Ma-coder	cfbee3d0e7	[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 ) Signed-off-by: qqma <qqma@amazon.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: qqma <qqma@amazon.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-22 10:37:43 -07:00
Bowen Wang	06a41334c7	[EPLB] Reduce EPLB Inference Overhead (#24573 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-22 16:31:05 +00:00
Burkhard Ringlein	175811e3b5	[V1][Attention] Split triton_attn in triton-only and rocm specific backends (#24648 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>	2025-09-22 15:20:28 +00:00
Csrayz	c10101a3eb	[Bugfix] Fix several issues with p2p xPyD in GET type (#23993 ) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com>	2025-09-22 14:53:13 +00:00
Sara-KS	ac243886b0	[Kernel] MI-300X triton moe configs (#23445 ) Signed-off-by: Sara Kokkila Schumacher <saraks@ibm.com>	2025-09-22 14:29:54 +00:00

1 2 3 4 5 ...

9784 Commits