xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-01 17:47:07 +08:00

Author	SHA1	Message	Date
Avshalom Manevich	a0f8a79646	[fix] fix qwen image_embeds input (#21049 ) Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>	2025-07-16 15:17:20 +00:00
Cyrus Leung	1c3198b6c4	[Model] Consolidate pooler implementations (#20927 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-16 13:39:13 +00:00
zhiweiz	c11013db8b	[Meta] Llama4 EAGLE Support (#20591 ) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: qizixi <qizixi@meta.com>	2025-07-15 21:14:15 -07:00
Peter Pan	1eb2b9c102	[CI] update typos config for CI pre-commit and fix some spells (#20919 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-15 21:12:40 -07:00
Wentao Ye	76ddeff293	[Doc] Remove duplicate docstring (#21012 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-15 20:09:13 -07:00
Michael Goin	f46098335b	[Bugfix] Fix Mistral3 support on SM100/SM120 (#20998 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 20:08:41 -07:00
Ming Yang	fcb9f879c1	[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… (#20937 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-15 19:53:42 -07:00
Brayden Zhong	75a99b98bf	[Chore] Remove outdated transformers check (#20989 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-15 19:42:40 -07:00
Thomas Parnell	6cbc4d4bea	[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 19:19:10 -07:00
Tuan, Hoang-Trong	f29fd8a7f8	[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>	2025-07-15 16:08:26 -04:00
Patrick von Platen	e7e3e6d263	Voxtral (#20970 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-15 07:35:30 -07:00
Thomas Parnell	3534c39a20	[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 04:04:35 -07:00
Jennifer He	85bd6599e4	[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534 ) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com>	2025-07-15 13:34:24 +08:00
Ruheena Suhani Shaik	016b8d1b7f	Enabled BnB NF4 inference on Gaudi (#20172 ) Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>	2025-07-14 20:26:08 -07:00
XiongfeiWei	d4170fad39	Use w8a8 quantized matmul Pallas kernel (#19170 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-15 03:06:33 +00:00
Michael Goin	bcdfb2a330	[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 01:42:17 +00:00
Thomas Parnell	86f3ac21ce	Fix overflow indexing in causal_conv1d kernel (#20938 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-14 21:43:07 +00:00
Varun Sundar Rabindranath	c0569dbc82	[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-14 19:47:16 +00:00
ant-yy	38efa28278	[Model] Add Ling implementation (#20680 ) Signed-off-by: vito.yy <vito.yy@antgroup.com>	2025-07-14 22:10:32 +08:00
Jee Jee Li	a99b9f7dee	[Quantization] add BNB for MixtralForCausalLM (#20893 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-14 07:34:34 +00:00
TJian	80d38b8ac8	[V1] [ROCm] [AITER] Upgrade AITER to commit `916bf3c` and bugfix APIs (#20880 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-07-13 15:19:32 +00:00
Minkyu Kim	bd4c1e6fdb	Support for LlamaForSequenceClassification (#20807 ) Signed-off-by: thechaos16 <thechaos16@gmail.com>	2025-07-13 00:09:34 -07:00
Nicolò Lucchesi	020f58abcd	[Core] Support multiple tasks per model (#20771 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-12 19:40:11 -07:00
Wentao Ye	c1acd6d7d4	[Refactor] Change the way of import triton (#20774 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:39:55 -07:00
ElizaWszola	3b3b778d4a	[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs (#20825 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-12 19:39:14 -07:00
Wentao Ye	42d440c22b	[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:38:45 -07:00
Michael Goin	6e2c176e1f	[Bugfix] Restrict Machete to only run on Hopper (#20830 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-12 17:34:40 +00:00
Alex Brooks	c2a2f19aba	[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-12 06:11:30 -07:00
Congcong Chen	2c11a738b3	[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702 ) Signed-off-by: Congcong Chen <congcongchen@microsoft.com>	2025-07-12 06:02:10 -07:00
Zhiyu	4afe687a82	Enable ModelOpt Llama4 fp8 checkpoint deployment (#20419 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-07-11 23:07:16 -07:00
Wentao Ye	0d4891cd03	[Bug] Fix DeepGemm for EP low latency case (#20833 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-11 23:05:12 -07:00
Nicolò Lucchesi	3c7d942da8	[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-11 21:33:26 -07:00
Varun Sundar Rabindranath	890323dc1b	[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once (#20852 )	2025-07-11 20:56:24 -07:00
Isotr0py	01cae37713	[CI/Build] Ensure compatability with Transformers v4.53 (#20541 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-11 20:53:07 -07:00
yurhett	11c0198615	[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-07-11 20:52:43 -07:00
Li, Jiang	b1235c3e10	[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-11 20:52:05 -07:00
Jee Jee Li	44d02f54db	[Misc] Restrict deep_gemm's log output (#20827 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 20:50:42 -07:00
Varun Sundar Rabindranath	53fa457391	[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-11 07:51:46 -07:00
Jee Jee Li	8020e98c9f	[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 08:01:13 +00:00
nopperl	5d09152ff1	[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-07-11 05:53:31 +00:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Michael Goin	922f316441	[Model] Support HF format of minimax (#20211 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 02:55:21 +00:00
Duncan Moss	5923ab9524	[fix]: disable cutlass block scaled group gemm for EP (#20781 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com>	2025-07-11 02:39:18 +00:00
Simon Mo	b854321ffe	[Docs] Lazy import gguf (#20785 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-10 16:06:37 -07:00
Varun Sundar Rabindranath	f0c98cae27	[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 14:40:38 -07:00
Varun Sundar Rabindranath	fdadb6f43a	[Bugfix] Fused MoE Modular Kernel chunking loop (#20392 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 20:31:10 +00:00
Ming Yang	3de2ed767f	[Bugfix] Remove assertion of expert_map being None (#20714 ) Signed-off-by: Ming Yang <yming@meta.com> Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-10 19:55:22 +00:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Sanger Steel	5e53c89a74	[Bugfix] [CI] Fix Tensorizer LoRA test (#20760 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-07-10 19:07:06 +00:00

1 2 3 4 5 ...

2151 Commits