xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-22 08:17:02 +08:00

Author	SHA1	Message	Date
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Cyrus Leung	aa29841ede	[Bugfix] Multi-modal caches not acting like LRU caches (#16593 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-14 09:24:16 -07:00
shangmingc	1dd23386ec	[Misc] Update usage with mooncake lib for kv transfer (#16523 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-14 11:31:37 +00:00
DefTruth	ce4ddd2d1a	[Misc] remove warning if triton>=3.2.0 (#16553 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-14 02:39:47 -07:00
Harry Mellor	e51929ebca	Improve configs - `SchedulerConfig` (#16533 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-14 17:24:16 +08:00
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Michael Goin	d085a44082	Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-13 14:55:18 +00:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
Ryan McConville	6c11ecf8d3	[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529 ) Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>	2025-04-12 20:19:19 +00:00
SnowCharm	93e5f3c5fb	[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484 ) Signed-off-by: snowcharm <snowcharmqq@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-12 22:54:37 +08:00
Jie Fu (傅杰)	70363bccfa	Fix syntaxWarning: invalid escape sequence '\s' (#16532 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-04-12 14:39:42 +00:00
Huazhong Ji	68bb122eb4	[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464 ) Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>	2025-04-12 09:20:25 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Michael Goin	bd6028d6b0	Optimized topk for topk=1 (Llama-4) (#16512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-12 14:21:08 +08:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Michael Goin	87b836ba77	Bugfix for PixtralHF models without spatial_merge_size (#16513 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 23:32:22 +00:00
rongfu.leng	56c76c2e0e	[Bugfix] clean up duplicated code (#16485 ) Signed-off-by: Gogs <gogs@fake.local> Co-authored-by: Gogs <gogs@fake.local>	2025-04-11 23:19:40 +00:00
Yong Hoon Shin	a3bf8d4a2b	[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488 )	2025-04-12 06:26:55 +08:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Harry Mellor	cd77382ac1	Improve configs - `LoadConfig` (#16422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 20:27:27 +00:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Richard Zou	70de35a881	Fix erroneous "model doesn't support compile" warning (#16486 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-11 16:24:36 +00:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Jee Jee Li	a26f59ccbc	[Misc] Raise error for V1 not supporting Long LoRA. (#16415 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 01:51:20 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
DefTruth	905e91e9ac	Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453 )	2025-04-11 06:44:22 +00:00
Alex Brooks	f8f9c0ba62	[Bugfix] Don't set an upper bound on repetition penalty (#16403 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-11 14:19:40 +08:00
Yong Hoon Shin	99ef59cf7f	[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 21:26:07 -07:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Cyrus Leung	56d4aefa33	[VLM] Avoid unnecessary dummy multimodal data during processing (#16416 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 19:32:14 +00:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Chih-Chieh Yang	daefed052c	[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-04-10 19:07:07 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Russell Bryant	9665313c39	[V1] Set structured output backend to `auto` by default (#15724 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-10 17:53:26 +00:00
Harry Mellor	0c54fc7273	Improve configs - `ParallelConfig` (#16332 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-10 17:34:37 +00:00
Nicolò Lucchesi	c1b57855ec	[TPU][V1] Use `language_model` interface for getting text backbone in MM (#16410 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:32:04 +00:00
Cyrus Leung	83b824c8b4	[VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (#16408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 09:06:58 -07:00
Lu Fang	7678fcd5b6	Fix the torch version parsing logic (#15857 )	2025-04-10 07:37:47 -07:00
Ye (Charlotte) Qi	61de3ef74b	[Model] Remove image mm limit for LLaMa4 (#16365 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 09:36:27 +00:00
Michael Goin	c70cf0fe06	[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-10 15:08:47 +08:00
Cyrus Leung	a5d11a54dc	[Bugfix] Fix validation error for text-only Mllama 3.2 (#16377 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 14:19:42 +08:00
Aaron Ang	a9bd832fc5	[Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383 ) Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>	2025-04-09 23:01:00 -07:00

1 2 3 4 5 ...

3945 Commits