xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-31 13:57:13 +08:00

Author	SHA1	Message	Date
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Cyrus Leung	43faa0461a	[Bugfix] Fix hybrid model tests (#17182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 15:14:37 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Harry Mellor	0bd7f8fca5	Bump Transformers to 4.51.3 (#17116 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:34:34 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Michael Goin	82e43b2d7e	Add missing rocm_skinny_gemms kernel test to CI (#17060 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 07:49:37 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	a9138e85b1	Fix OOT registration test (#17099 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:44:12 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Michael Goin	14288d1332	Disable enforce_eager for V1 TPU sampler and structured output tests (#17016 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 02:50:09 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Michael Goin	6317a5174a	Categorize `tests/kernels/` based on kernel type (#16799 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-23 09:21:07 -04:00
Nick Hill	1e013fa388	[V1][DP] More robust DP/EP dummy request coordination (#16277 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 19:12:15 -07:00
Aleksandr Malyshev	bc7c4d206b	[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-22 19:11:56 -07:00
Guillaume Calmettes	36fe78769f	[Bugfix] validate urls object for multimodal content parts (#16990 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-23 09:43:06 +08:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
vllmellm	30bc3e0f66	[FEAT][ROCm]: Support AITER MLA (#15893 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com>	2025-04-22 09:31:13 -07:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
Charlie Fu	188b7f9b8c	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-21 20:46:22 -07:00
Varun Sundar Rabindranath	7b8a2ab76f	[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>	2025-04-21 20:44:32 -07:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
qizixi	bb3605db85	[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-04-20 20:54:29 -07:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
wang.yuqi	3d3ab3689f	[New Model]: Snowflake Arctic Embed (Family) (#16649 )	2025-04-18 08:11:57 -07:00
Harry Mellor	686623c5e7	Fix `nullable_kvs` fallback (#16837 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-18 05:58:39 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00

1 2 3 4 5 ...

1800 Commits