xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 02:37:12 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	31060b2757	[V1][BugFix] Detect interleaved sliding window attention (#14896 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-16 14:53:53 -07:00
Nick Hill	fc1f67715d	[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-16 14:53:34 -07:00
Cyrus Leung	f6137adbcb	Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) (#14892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:13:46 -07:00
Cyrus Leung	e53b1350f2	[Bugfix] Explicitly disable Phi-4-multimodal in V1 (#14889 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:05:40 -07:00
Kyle Sayers	d30aa7e9e6	[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-16 07:44:19 -07:00
Nick Hill	b82662d952	[BugFix] Fix torch distributed stateless PG backend init (#14870 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-15 20:26:19 -07:00
Simon Mo	71c1e07107	[Kernel] Add more tuned configs (#14877 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-15 20:25:03 -07:00
Roger Wang	b30c75dda4	[V1] Remove V0 fallback for mistral-tokenizer (#14873 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-15 20:21:11 -07:00
Isotr0py	def232e122	[VLM] Clean up Phi-4-MM ViT implementation (#14812 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-15 18:53:52 -07:00
Rémi Delacourt	61c6a5a796	[VLM] Merged multi-modal processor for Pixtral (#12211 ) Signed-off-by: remi <remi@mistral.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-15 06:28:27 -07:00
Jun Duan	74bc397b0a	[Core] Expose API endpoint `/is_sleeping` (#14312 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-15 06:28:14 -07:00
Cyrus Leung	3556a41434	[VLM] Limit multimodal input cache by memory (#14805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-15 02:52:05 -07:00
Bryan Lu	9ed6ee92d6	[Bugfix] EAGLE output norm bug (#14464 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-03-15 06:50:33 +00:00
Aaron Pham	4c7629cae9	[V1][Structured Output] calculate vocab_size eagerly (#14851 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-14 22:09:51 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Isotr0py	97ac781c62	[Misc] Remove misleading message in gemma2 and gemma3 (#14850 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-14 21:35:12 -07:00
Russell Bryant	776dcec8fe	Disable outlines cache by default (#14837 )	2025-03-15 03:57:55 +00:00
Tyler Michael Smith	ccf02fcbae	Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… (#14848 )	2025-03-14 20:45:42 -07:00
DefTruth	acaea3bb07	[Bugfix][V1] Fix flashinfer sampling (#14815 )	2025-03-14 20:42:38 -07:00
yarongmu-google	dd344e0342	[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-15 00:41:15 +00:00
Michael Goin	14f301b541	Update to torch==2.6.0 (#12721 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-14 16:58:30 -04:00
Chih-Chieh Yang	fe66b34728	[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14778 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-03-14 16:36:18 -04:00
daniel-salib	73deea2fdb	[Frontend] track server_load (#13950 )	2025-03-14 09:53:17 -07:00
Russell Bryant	0b0d6421b2	[Frontend] Fix log message to use http vs https (#14774 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 09:21:09 -07:00
Russell Bryant	1140991a7b	[V1] Fix vocab size calculation for structured output (#14826 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 09:18:38 -07:00
Guillaume Calmettes	fd8e055ffb	[BugFix]: properly catch templating error when preprocess input (#13976 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-03-14 05:58:34 -07:00
Cyrus Leung	ab93f1360f	[VLM] Various cleanup and fixes (#14806 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 05:58:19 -07:00
Woosuk Kwon	c77620d22d	[V1][Minor] Minor code cleanup for scheduling metrics (#14800 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-14 08:21:28 +00:00
Jee Jee Li	989ecd2007	[Misc] Gemma3ForConditionalGeneration supports LoRA (#14797 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-14 01:07:30 -07:00
Cyrus Leung	601bd3268e	[Misc] Clean up type annotation for `SupportsMultiModal` (#14794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 00:59:56 -07:00
Lucas Wilkinson	9532c49836	[Attention] MLA get rid of materialization (#14770 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 23:39:02 -07:00
Nick Hill	4059adc31b	[Misc][Minor] Simplify `SamplingParams.__post_init__()` (#14772 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-14 11:44:20 +08:00
Thien Tran	95d680b862	[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it (#14681 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 20:43:18 -07:00
Thomas Parnell	fb4c7f8ef0	[Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. (#14431 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>	2025-03-13 20:42:27 -07:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
Woosuk Kwon	32ef4983cd	[V1] Temporarily disable FlashInfer Rejection Sampler (#14788 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-13 20:40:35 -07:00
Roger Wang	ad19c8a003	[V1] Move OOM check into sampler run (#14728 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-03-13 20:40:23 -07:00
Alexander Matveev	7888e1d0a3	[V1] TPU - Enable prefix caching by default (#14773 )	2025-03-13 20:40:05 -07:00
yasu52	3fb17d26c8	[Doc] Fix typo in documentation (#14783 ) Signed-off-by: yasu52 <tsuguro4649@gmail.com>	2025-03-13 20:33:09 -07:00
Lucas Wilkinson	d47807ba08	[Attention] Remove slow setattr in MLA (#14769 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 21:31:14 +00:00
afeldman-nm	02fcaa3d0a	[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>	2025-03-13 19:07:34 +00:00
Aaron Pham	8a4a2efc6f	[V1][Core] using cached vocab_size for Structured Outputs (#14630 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-13 11:39:28 -07:00
Woosuk Kwon	01b3fd0af7	[V1][Minor] Minor enhancements on scheduler (#14732 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-13 08:53:22 -07:00
Cyrus Leung	f53a0586b9	[Bugfix] Fix prompt format of GLM4V (#14539 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-13 11:37:17 +00:00
Isotr0py	b1cc4dfef5	[VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-13 03:10:02 -07:00
Cyrus Leung	382403921f	[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-13 02:23:12 -07:00
Szymon Ożóg	55211b01e8	[Bugfix] Fix chunked prefill for GGUF (#14666 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>	2025-03-13 07:19:03 +00:00
Kyle Sayers	5d043c1685	[Quant] Bamba SupportsQuant (#14698 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-13 04:57:05 +00:00

1 2 3 4 5 ...

3517 Commits