xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 07:17:11 +08:00

Author	SHA1	Message	Date
Max Hu	412e153df5	[Feature] Allow configuring FlashInfer workspace size (#28269 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 23:32:20 +00:00
Michael Goin	e5f599d4d1	[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 23:16:12 +00:00
wangxiyuan	d4902ba56d	[Misc] Cleanup Executor interface (#28441 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 22:28:07 +00:00
Kyuyeun Kim	df4d3a44a8	[TPU] Rename path to tpu platform (#28452 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-11-11 19:16:47 +00:00
Jee Jee Li	9d1c474704	[LoRA][1/N]Remove LoRA extra vocab (#28382 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-11 11:06:21 -08:00
Jie Luo	8c32c6e4b4	[Misc] fix typo in DCP comment (#28389 ) Signed-off-by: Livinfly <luojie3m@gmail.com>	2025-11-11 10:59:16 -08:00
Canlin Guo	de120bc94f	[V0 deprecation] Clean up num_prefill_tokens logic for V0 (#28203 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-11 10:57:12 -08:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Lukas Geiger	76e4dcf225	[Misc] Remove unused attention prefix prefill ops functions (#26971 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 18:26:04 +00:00
Fanli Lin	d5edcb8678	[BugFix] Fix Siglip2Attention on XPU (#28448 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 18:18:02 +00:00
Matthew Bonanni	684f254585	Prefer FlashAttention MLA as default over FlashMLA (#27363 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-11 17:13:51 +00:00
xuebwang-amd	5a1271d83a	[Quantization] fix attention quantization of gpt_oss model (#27334 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2025-11-11 12:06:00 -05:00
xuebwang-amd	05576df85c	[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-11 12:05:22 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Nicolò Lucchesi	a7ef3eb0cd	[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282 )	2025-11-11 16:57:43 +00:00
Michael Goin	f9a4087182	Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 11:46:04 -05:00
Fanli Lin	b886068056	[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 15:29:33 +00:00
bnellnm	a1448b4b69	[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064 )	2025-11-11 07:29:02 -07:00
Cyrus Leung	afffd3cc8a	[Model] Pass `mm_features` directly into `get_mrope_input_positions` (#28399 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 21:14:48 +08:00
Chaojun Zhang	7dbe6d81d6	Fix Fused MoE LoRA Triton kernel bug (#28450 ) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>	2025-11-11 20:46:47 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Lukas Geiger	9973e6e04a	[Model][Qwen3VL] Slighly speedup `fast_pos_embed_interpolate` (#28434 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 10:35:10 +00:00
Fanli Lin	c7991269dd	[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` (#28387 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-11 08:45:38 +00:00
Jiangyun Zhu	f0359fffa4	[Bugfix] fix qwen3-next crash (#28202 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-11 08:24:28 +00:00
Sage Moore	798c7bebca	[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU transfers in EPLB (#28369 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-11-11 00:19:51 -08:00
Roger Wang	4fd4b743a2	[Bugfix] Fix max image size for PaddleOCR-VL (#28442 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-11 08:07:24 +00:00
David Ben-David	cc079763c5	[BugFix] Avoid calling KV connector layer APIs when metadata is unset (#28253 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 23:39:36 -08:00
Robert Shaw	e605e8e323	[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-11 05:59:08 +00:00
Zuyi Zhao	bca74e32b7	[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com> Signed-off-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-11 04:57:01 +00:00
Zhuohan Li	8d706cca90	[Misc] FlattenLogprobs -> FlatLogprobs (#28335 )	2025-11-11 03:41:23 +00:00
Michael Goin	f2d9ad0620	Only register rocm_aiter_ops if aiter is found (#28428 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 02:53:24 +00:00
Wentao Ye	de540c0354	[Feature] Add env var `VLLM_MOE_USE_DEEP_GEMM` (#28422 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 02:29:48 +00:00
Lucas Wilkinson	39029d5192	[CI/Test Fix] Fix CP tests on Blackwell (#28404 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 01:36:29 +00:00
Wentao Ye	35d801f13f	[Feature] Refactor batch invariant fp8 DeepGEMM (#27606 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 00:08:40 +00:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Yong Hoon Shin	021143561f	[ROCm] Add missing gemm_a8w8_blockscale import (#28378 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-10 23:13:36 +00:00
Robert Shaw	30700b1cd7	[CI] Fix Plugin Tests Tests (#28413 ) Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>	2025-11-10 22:36:11 +00:00
Andrew Xia	4b94ed8f92	[Frontend][2/n] remove empty content from _parse_tool_calls_from_content (#28331 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-11-10 14:07:49 -08:00
Lucas Wilkinson	6dec9f6109	[BugFix] Fix DeepGEMM over-allocating workspace (#28254 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 17:01:17 -05:00
Wei Wei	bf6a3d0ff5	[Misc] Add more scoping for improved trace (#28329 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-10 21:03:21 +00:00
Sage Moore	40d33264c6	[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-10 20:39:19 +00:00
Rémi Delacourt	6d54336ae5	[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-10 14:53:32 -05:00
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
Cyrus Leung	d0e186c16f	[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoPE (#28395 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 00:30:06 +08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
caozuoba	40e2eeeb92	[Kernel] Optimization of the mm_k operator. (#28280 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-10 16:03:46 +00:00
zejunchen-zejun	b06b9470ca	[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-11-10 10:38:56 -05:00

1 2 3 4 5 ...

7778 Commits