xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 18:05:02 +08:00

Author	SHA1	Message	Date
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
daniel-salib	014ece97c7	[Frontend] Add tool filtering support to ToolServer (#29224 ) Signed-off-by: Daniel Salib <danielsalib@meta.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-01 08:03:57 +00:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Huamin Li	83805a6078	[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-12-01 04:38:06 +00:00
Omer Ullman Argov	39d28108f4	[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )	2025-11-30 11:02:40 -05:00
Cyrus Leung	64bc09ba27	[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 17:31:12 +08:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Vensen	66b5840287	[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783 ) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-30 14:24:25 +08:00
Xin Yang	a491b0911b	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-30 10:37:25 +08:00
Jee Jee Li	b9d0504a36	[Bugfix] Revert test_tokenization.py (#29729 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-29 16:35:15 +00:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
Cyrus Leung	fa59fe417f	[Chore] Move `detokenizer_utils` to `vllm/tokenizers` (#29727 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 06:25:17 -08:00
Cyrus Leung	fe3398fab2	[Chore] Enable passing `tokenizer=None` into MM processor (#29724 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 06:25:10 -08:00
Chukwuma Nwaugha	ad7f714d62	hfrunner.classify should return list[list[float]] not list[str] (#29671 ) Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>	2025-11-29 13:57:00 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Jee Jee Li	39e63dec7c	[LoRA] Cleanup LoRA unused code (#29611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 22:52:58 -08:00
Angela Yi	4b17ce6815	Add gpu memory wait before test_async_tp (#28893 ) Signed-off-by: angelayi <yiangela7@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-28 20:19:05 -08:00
Lucas Wilkinson	e23f665d83	[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-28 20:19:01 -08:00
Tsukasa OI	762a4a6ca9	[Frontend] Perform offline path replacement to `tokenizer` (#29706 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-11-28 18:32:08 -08:00
Cyrus Leung	b2c50eda50	[Bugfix] Fix wrong mock attribute (#29704 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 10:30:41 +08:00
Andreas Karatzas	ea3370b428	[ROCm][Bugfix] Patch for the `Multi-Modal Processor Test` group (#29702 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-11-29 01:31:44 +00:00
Mert Unsal	c625d7b1c6	[Bugfix] Fix O(n²) multimodal string prompt processing (#29667 ) Signed-off-by: mertunsall <mertunsal1905@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-28 16:10:39 -08:00
Huamin Li	3fd1fb0b60	Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971 )" (#29697 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-28 15:26:52 -08:00
Cyrus Leung	7675ba30de	[Misc] Remove redundant `ClassRegistry` (#29681 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-28 15:24:47 -08:00
Benjamin Chislett	1986de1375	[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-28 22:25:05 +00:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Harry Mellor	fecae12cd7	Remove `all_special_tokens_extended` from tokenizer code (#29686 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-28 20:26:51 +00:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Isotr0py	f946a8d743	[Chore]: Reorganize model repo operating functions in `transformers_utils` (#29680 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-28 08:46:51 -08:00
Nick Hill	8e7a891602	[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-28 20:52:23 +08:00
Cyrus Leung	33b06a6f24	[Misc] Remove redundant attention var constants (#29650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 04:35:19 -08:00
Julien Denize	b2c1d294fa	[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 16:44:47 +08:00
wang.yuqi	f4b76056ee	Improve enable chunked_prefill & prefix_caching logic. (#26623 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-27 22:05:48 -08:00
EanWang211123	37b15e97e8	[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-27 22:05:45 -08:00
maang-h	c7ba1f6bc7	[BugFix] Fix ValueError in NewRequestData repr methods (#29392 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 13:42:30 +08:00
Xin Yang	745a3bae1a	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-28 10:48:28 +08:00
Nicolò Lucchesi	e5a621b724	[CI] Add batched audios Whisper test (#29308 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-27 19:31:52 +00:00
Matthew Bonanni	fc1d8be3dc	[Attention] Update attention imports (#29540 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-27 11:19:09 -05:00
Ryan Rock	bab438ff3e	[CI/Build] Skip ray tests on ROCm (#29556 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-11-27 07:01:37 -08:00
Jee Jee Li	2f5f9acd55	[LoRA] Continue optimizing MoE LoRA weight loading (#29322 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-27 05:56:28 -08:00
Cyrus Leung	e6d4f3c254	[Bugfix] Fix pre-commit (#29601 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 02:23:06 -08:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00
Micah Williamson	43c5792592	[ROCm][CI] Fix test_cpu_offloading for ROCm (#29548 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-27 07:54:44 +00:00
HDCharles	df01eda4dc	[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>	2025-11-26 21:35:13 -05:00
Lucas Wilkinson	56539cddac	[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579 )	2025-11-26 14:07:13 -05:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Wentao Ye	0b0aa874e8	[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-26 09:38:52 -07:00
Huamin Li	70d5953f82	Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 )" (#29483 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-26 22:27:26 +08:00
Yejing Lai	bb706d6048	Fix TeleChatForCausalLM not register issue (#29473 ) Signed-off-by: Lai, Yejing <yejing.lai@intel.com>	2025-11-26 05:15:00 -08:00
Nick Hill	4e57c6587f	[Core] Support logprobs with spec decode + async scheduling (#29223 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-25 12:55:24 -08:00

... 2 3 4 5 6 ...

3846 Commits