xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-07 20:37:07 +08:00

Author	SHA1	Message	Date
Alexander Matveev	ccd21e1993	[V1] Fix profiling.py Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2025-04-11 18:36:37 +00:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Richard Zou	70de35a881	Fix erroneous "model doesn't support compile" warning (#16486 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-11 16:24:36 +00:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Jee Jee Li	a26f59ccbc	[Misc] Raise error for V1 not supporting Long LoRA. (#16415 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 01:51:20 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
DefTruth	905e91e9ac	Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453 )	2025-04-11 06:44:22 +00:00
Alex Brooks	f8f9c0ba62	[Bugfix] Don't set an upper bound on repetition penalty (#16403 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-11 14:19:40 +08:00
Yong Hoon Shin	99ef59cf7f	[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 21:26:07 -07:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Cyrus Leung	56d4aefa33	[VLM] Avoid unnecessary dummy multimodal data during processing (#16416 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 19:32:14 +00:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Chih-Chieh Yang	daefed052c	[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-04-10 19:07:07 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Russell Bryant	9665313c39	[V1] Set structured output backend to `auto` by default (#15724 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-10 17:53:26 +00:00
Harry Mellor	0c54fc7273	Improve configs - `ParallelConfig` (#16332 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-10 17:34:37 +00:00
Nicolò Lucchesi	c1b57855ec	[TPU][V1] Use `language_model` interface for getting text backbone in MM (#16410 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:32:04 +00:00
Cyrus Leung	83b824c8b4	[VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (#16408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 09:06:58 -07:00
Lu Fang	7678fcd5b6	Fix the torch version parsing logic (#15857 )	2025-04-10 07:37:47 -07:00
Ye (Charlotte) Qi	61de3ef74b	[Model] Remove image mm limit for LLaMa4 (#16365 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 09:36:27 +00:00
Michael Goin	c70cf0fe06	[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-10 15:08:47 +08:00
Cyrus Leung	a5d11a54dc	[Bugfix] Fix validation error for text-only Mllama 3.2 (#16377 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 14:19:42 +08:00
Aaron Ang	a9bd832fc5	[Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383 ) Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>	2025-04-09 23:01:00 -07:00
Michael Goin	baada0e737	[Bugfix][TPU] Fix TPU validate_request (#16369 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-04-10 12:55:12 +08:00
Benjamin Kitor	82eb61dd4c	[misc] use tqdm.auto where appropriate (#16290 ) Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>	2025-04-09 21:54:54 -07:00
Jintao	4aed0ca6a2	[bugfix] Avoid the time consumption caused by creating dummy videos. (#16371 )	2025-04-10 04:30:05 +00:00
Chengji Yao	1621b25288	[TPU] Fix dummy loading OOM (#16372 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-10 04:06:16 +00:00
Aaron Ang	a564797151	[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (#16325 ) Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>	2025-04-09 20:07:40 -07:00
Guillaume Calmettes	1da6a09274	[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (#14094 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 19:43:09 -07:00
Yuxuan Zhang	1e44ffc3ff	Add GLM-4-0414 support (#16338 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Ajay Vohra <ajayvohr@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com> Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-10 09:19:42 +08:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Joe Runde	cb391d85dc	[Hardware] add platform-specific request validation api (#16291 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-09 12:50:01 -07:00
Guillaume Calmettes	c3b5189137	[Bugfix] catch AssertionError in MistralTokenizer as ValueError (#16344 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 17:33:24 +00:00
Guillaume Calmettes	98d01d3ce2	[Bugfix][Frontend] respect provided default guided decoding backend (#15476 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 05:11:10 -07:00
Nicolò Lucchesi	d55244df31	[Model] Add `SupportsMultiModal.get_language_model` interface (#16007 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-09 04:12:54 -07:00
yihong	04149cce27	[BugFix] fix some typos found by typos. (#16314 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 03:43:59 -07:00
ajayvohra2005	24834f4894	update neuron config (#16289 ) Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>	2025-04-09 03:43:22 -07:00
Lucia Fang	ec7da6fcf3	[BugFix] llama4 qknorm should be not shared across head (#16311 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-04-09 00:59:14 -07:00
yihong	819d548e8a	[BugFix] logger is not callable (#16312 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 00:59:02 -07:00
Cyrus Leung	e484e02857	[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (#16273 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-09 00:51:27 -07:00
Russell Bryant	cb84e45ac7	[Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 19:13:22 -07:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
TJian	2976dc27e9	[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-04-08 19:12:34 -07:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
yueshen2016	1f4b09b525	Add support to modelopt quantization of Mixtral model (#15961 ) Signed-off-by: Yue <yueshen@nvidia.com>	2025-04-09 01:53:31 +00:00
Jinzhen Lin	db10422184	[Bugfix] fix deepseek fp16 scale bug (#14809 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-08 16:56:09 -04:00
Lucas Wilkinson	e1a2c699dd	[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (#16209 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-08 18:56:51 +00:00

1 2 3 4 5 ...

3921 Commits