xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-12 07:27:14 +08:00

Author	SHA1	Message	Date
Cyrus Leung	633f943e30	[Doc] Update Batch-level DP docs (#25757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-26 02:37:40 -07:00
Xu Wenqing	b03b1b97f6	Support LongCat-Flash-Chat tool call (#24083 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-09-26 09:25:39 +00:00
Sage Moore	dfb9af2014	[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-26 01:25:28 -07:00
yyzxw	19f76ee68e	[misc] refactor speculative config (#25657 ) Signed-off-by: zxw <1020938856@qq.com>	2025-09-26 01:22:06 -07:00
Icey	dd70437a4f	Remove cuda hard-code in compute_causal_conv1d_metadata (#25555 ) Signed-off-by: Icey <1790571317@qq.com>	2025-09-26 01:19:20 -07:00
Tao He	99b3a504c5	[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-26 01:18:58 -07:00
Iceber Gu	6e30010d2f	fix: print outputt offline_inference/base/chat.py example (#25744 ) Signed-off-by: Iceber Gu <caiwei95@hotmail.com>	2025-09-26 01:18:24 -07:00
xaguilar-amd	52621c8f5c	[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703 ) Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com>	2025-09-26 01:18:20 -07:00
Andrew Sansom	d48f4d6daf	perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-26 01:18:09 -07:00
Andrew Sansom	e84e0735c7	fix: revert cast to cpu in `MsgpackEncoder._encode_tensor` to avoid hidden performance regressions (#25738 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-26 01:18:05 -07:00
yitingdc	3edf87d25f	[CI/Build] fix doc build warning: Failed to get 'name: description' pair (#25733 ) Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io>	2025-09-26 01:18:02 -07:00
Eugene Khvedchenya	392edee34a	EVS Support (Video tokens pruning) (#22980 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-26 11:54:54 +08:00
Nick Hill	983056e456	[Misc] Remove unnecessary memoryviews in shm_broadcast.py (#25721 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-26 03:11:44 +00:00
Russell Bryant	13dd93c667	[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-25 18:21:56 -07:00
Aleksandr Malyshev	53a30845be	Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com>	2025-09-25 19:16:53 -06:00
Nick Hill	8b77328ffe	[Misc] Don't log shm dequeue delay warning on worker side (#25720 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-26 01:08:30 +00:00
Wentao Ye	9fe4c2bdb9	[Refactor] Remove DeepGEMM OP Register (#25710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-25 20:13:41 -04:00
Shu Wang	081b5594a2	Fix routing_bias dtype (#25711 ) Signed-off-by: Shu Wang. <shuw@nvidia.com>	2025-09-25 23:35:14 +00:00
tomeras91	57329a8c01	[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-09-25 16:10:29 -07:00
Zhuohan Li	8c435c9bce	[Core] Enable command line logging for LLMEngine (#25610 ) Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-25 15:31:17 -07:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Cyrus Leung	89fa54e6f7	[Optimization] Use a cheaper cache key in `get_model_architecture` (#25682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 17:54:20 -04:00
Cyrus Leung	3d54bdcb73	[Optimization] Streamline `InputPreprocessor` (#25702 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 21:06:49 +00:00
Cyrus Leung	6b0fcbbf43	[Misc] Simplify `test_argsort_mm_positions` (#25690 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 18:23:01 +00:00
Jee Jee Li	0fa673af4c	[V0 deprecation] Clean up LoRA (#25686 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-25 18:12:33 +00:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Isotr0py	71b25b0d48	[V0 deprecation] Clean up V0 fallback in compilation config (#25675 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-25 17:29:51 +00:00
Cyrus Leung	0ea80c87d9	[Model] Define `merge_by_field_config` MM interface (#25676 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 17:13:07 +00:00
Tao Hui	b8d9e4a326	[Model] Add optional parameter to reasoning parser constructor (#25554 ) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-26 01:12:50 +08:00
Lucas Wilkinson	13cc7f5370	[BugFix] Fix DBO hang (#25625 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-25 17:04:48 +00:00
Michael Goin	916bd9204d	Revert "[Bug] Dynamo Unsupported due to `BasevLLMParameter.torch_function` calling disabled super()" (#25681 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-25 09:45:06 -07:00
AlonKejzman	e04a1b6b21	[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662 ) Signed-off-by: AlonKejzman <alonkeizman@gmail.com>	2025-09-25 15:40:14 +00:00
Tyler Michael Smith	2e5df88c92	[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-25 15:16:06 +00:00
Nicolò Lucchesi	0754ac4c49	[Misc] Remove cruft file in repo (#25678 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-25 08:05:12 -07:00
Isotr0py	03858e6d1c	[Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-25 14:46:04 +00:00
Russell Bryant	532a6cfccb	[ux] Switch a warning to debug about a pytorch fallback (#23750 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-25 14:38:16 +00:00
Li, Jiang	eb32335e35	[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-25 13:29:11 +00:00
Jonas M. Kübler	69a8c8e99a	[torch.compile] Make Query Quantization Fusable (#24914 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-25 09:25:12 -04:00
youkaichao	6c340da4df	[misc] log info messages by default for hanging / busy / idle (#25627 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-25 21:14:57 +08:00
Cyrus Leung	2f17117606	[mypy] Fix wrong type annotations related to tuple (#25660 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 13:00:45 +00:00
chenlang	1e9a77e037	[Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112 ) Signed-off-by: chenlang <chen.lang5@zte.com.cn> Co-authored-by: chenlang <10346245@zte.com.cn>	2025-09-25 20:46:11 +08:00
Kunshang Ji	d2af67441d	[XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-25 12:38:11 +00:00
Cyrus Leung	0bcc3a160d	[CI/Build] Fix flaky entrypoints test (#25663 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 12:19:40 +00:00
Harry Mellor	70fbdb26e9	Add backward compatibility for `guided_...` API (#25615 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-25 19:45:25 +08:00
wang.yuqi	7f570f1caa	[V0 deprecation] Remove unreachable model_config.supported_tasks (#25642 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-25 11:26:31 +00:00
yyzxw	eaeca3cd7f	[Bugfix] Parse SpeculativeConfig Error (#25142 ) Signed-off-by: zxw <1020938856@qq.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-25 11:09:39 +00:00
Cyrus Leung	12c1287d64	[mypy] Further improve MM type annotations (#25654 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 10:57:36 +00:00
Isotr0py	17b4c6685c	[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling (#25648 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-25 18:36:01 +08:00
Agata Dobrzyniewicz	3c2b2ccece	[Bugfix] Add triton.language.tensor placeholder (#25649 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-09-25 10:31:14 +00:00
Roger Wang	7be9ffcd9f	[Misc] Fix Qwen3-VL `video_grid_thw` typing (#25646 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-25 10:16:45 +00:00

1 2 3 4 5 ...

9920 Commits