xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 17:27:17 +08:00

Author	SHA1	Message	Date
Jialin Ouyang	b46e4a06f1	[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor (#27618 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-28 08:13:10 +00:00
Li, Jiang	d34f5fe939	[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms (#27526 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-27 23:25:44 -07:00
Eric Yue	bdb01a38fe	[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-10-27 22:58:06 -07:00
Chauncey	61fbfe5274	[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines (#27555 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-28 02:18:08 +00:00
Kuntai Du	255e34ca50	[Stability fix] turn off HMA allocator when connector is set (#27592 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-10-27 18:32:23 -07:00
Roger Wang	a8d2e326ec	[Bugfix][CI] Fix config resolving logic with remote models (#27610 )	2025-10-28 00:48:32 +00:00
Andrew Xia	53a56e658b	[gpt-oss][2/N] Support input_messages in responsesRequest (#26962 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-10-27 23:15:49 +00:00
usberkeley	69f064062b	Code quality improvements: version update, type annotation enhancement, and enum usage simplification (#27581 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-10-27 17:50:22 +00:00
Cyrus Leung	6ebffafbb6	[Misc] Clean up more utils (#27567 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 15:30:38 +00:00
tingtinggithub	23ad820553	fixing mm placeholder replacement issue with gemma3 (#27538 ) Signed-off-by: tingtingtang1992 <streamttt@gmail.com>	2025-10-27 14:34:01 +00:00
Varun Sundar Rabindranath	5d3be3ba4c	[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-27 07:32:50 -07:00
Yu Jiaqi	4f882be4a0	[Model] Siglip2 Model Support (#27566 ) Signed-off-by: piood <2477084691@qq.com>	2025-10-27 06:57:37 -07:00
Asaf Joseph Gardin	9273754222	[Hybrid] Added supports_mamba_prefix_caching Protocol (#27339 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-10-27 13:05:20 +00:00
Jee Jee Li	f4e8154076	[Kernel] Enable moe LoRA kernel support FP16 (#27468 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-27 19:48:37 +08:00
Fadi Arafeh	a663f6ae64	[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-10-27 11:14:55 +00:00
Chauncey	a4fc21895e	[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. (#27561 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-27 11:06:43 +00:00
Shanshan Shen	a3e8611da5	[Bugfix] Limit the default value of `max_model_len` when it is not specified by users (#27556 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-10-27 10:16:20 +00:00
Cyrus Leung	7c2bdb83dc	[Misc] Clean up utils (#27552 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 09:05:40 +00:00
Danielle Robinson	9932ed6a83	[Kernel] Adding split_K implementation for fused_moe_lora (#27291 ) Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Signed-off-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-27 02:05:24 -07:00
Jee Jee Li	2d631d28c6	[Doc] Slight improvement to M2 and beyond (#27554 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-27 09:02:10 +00:00
Cyrus Leung	b368382964	[Model] Deprecate `merge_by_field_config=False` (#27551 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 16:43:00 +08:00
gnovack	a806c14cc7	[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora (#27445 ) Signed-off-by: gnovack <gnovack@amazon.com>	2025-10-27 06:31:55 +00:00
Cyrus Leung	cbd5e07a51	[Model] Use merge_by_field_config for MM models (Qwen series) (#27546 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 05:38:05 +00:00
CSWYF3634076	63b22e0dbb	[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple (#27316 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-10-26 20:53:31 -07:00
Roger Young	5980604c44	Fix MiniMax-M2 copyright (#27537 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-27 03:29:51 +00:00
Roger Young	720af6ab79	[Model][MiniMax-M2] Support MiniMax-M2 Model (#27535 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-27 00:59:11 +08:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Lucia Fang	315b860abe	[bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-10-26 08:16:35 +00:00
rongfu.leng	87c41c26ad	[Bugfix] Fix processor initialization for model from modelscope instead of HF (#27461 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-26 07:44:31 +00:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00
Cyrus Leung	66a168a197	[CI/Build] Refactor processing tests (#27470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-25 16:14:30 +00:00
Matthew Bonanni	a99564ac5b	[Attention] Add missing kv cache scale setup (#27490 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-25 00:12:49 -07:00
Cyrus Leung	4c5f632165	[Misc] Simplify max tokens in multimodal registry (#27500 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-24 23:56:01 -07:00
Kuntai Du	b853540388	[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-10-24 23:34:18 -07:00
Zhuohan Li	56ed7609a9	Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… (#27502 )	2025-10-25 05:31:43 +00:00
Yihua Cheng	83f478bb19	[KVConnector] Migrate the LMCache integration code to be vLLM native (#25542 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-10-25 00:23:53 +00:00
Varun Sundar Rabindranath	269c4db0a4	[Misc][DP] Guard mxfp4 implementation selection (#27484 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-24 23:29:24 +00:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
Pengchao Wang	d95d0f4b98	[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328 ) Signed-off-by: Pengchao Wang <wpc@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-10-24 14:16:44 -07:00
Lehua Ding	0402428200	[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455 ) Signed-off-by: Lehua Ding <lehuading@tencent.com>	2025-10-24 20:45:36 +00:00
Isotr0py	acc78aeb88	[Bugfix] Fix interns1-vit qk norm code path (#27480 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-24 17:43:45 +00:00
Ming Yang	0f67d4d962	[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-24 10:24:08 -07:00
kourosh hakhamaneshi	7e1d697b56	[Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366 ) Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-24 17:08:05 +00:00
Chendi.Xue	699d62e6cf	[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished (#27297 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-24 17:01:41 +00:00
Richard Zou	cd390b609d	[compile] Turn standalone_compile back on (#27460 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-10-24 16:30:27 +00:00
fhl2000	284cc92275	[MISC] `cudagraph_capture_sizes` related improvements (#26016 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-24 05:11:05 -07:00
Cyrus Leung	b7030d962b	[Benchmark] Enable benchmark to run with `encoding_format="bytes"` (#27467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-24 11:16:50 +00:00
Chauncey	3567816932	[Refactor] move tool parsing logic from protocol.py to the tool parser (#27383 ) Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-10-24 09:53:23 +00:00
22quinn	e0ef8a2920	[BugFix] Fix torchrun DP with LLM class (#27395 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-24 08:11:37 +00:00
Isotr0py	42efe609ba	[MM][Bugfix] Replace `PatchEmbed`'s conv3d to linear layer (#27418 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-24 07:32:47 +00:00

1 2 3 4 5 ...

7471 Commits