xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-25 14:47:19 +08:00

Author	SHA1	Message	Date
Isotr0py	edf309ebbe	[VLM] Support multimodal inputs for Florence-2 models (#13320 )	2025-02-27 02:06:41 -08:00
Woosuk Kwon	b382a7f28f	[BugFix] Make FP8 Linear compatible with torch.compile (#13918 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-26 13:48:55 -08:00
Wallas Henrique	4cb6fa0a9c	[Bugfix] Backend option to disable xgrammar any_whitespace (#12744 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 10:52:34 -08:00
Roger Wang	7ca1da020f	[Misc] Fix input processing for Ultravox (#13871 )	2025-02-25 23:56:34 -08:00
Seth Kimmel	e206b54331	[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine (#13837 ) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>	2025-02-26 14:58:24 +08:00
Harry Mellor	24679788ed	DeepSeek V2/V3/R1 only place `lm_head` on last pp rank (#13833 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-26 01:24:57 +00:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Liangfu Chen	f75aa72732	[Neuron] Add custom_ops for neuron backend (#13246 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: George Novack <gnovack@amazon.com> Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>	2025-02-25 11:47:49 -08:00
Cyrus Leung	f4133ce4e5	[Bugfix] Revert inspection code in #13743 (#13832 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-26 00:18:50 +08:00
Isotr0py	6ff518626c	[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818 )	2025-02-25 06:03:02 -08:00
Russell Bryant	aab392774b	[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-25 08:21:25 +00:00
Cyrus Leung	6724e79164	[Misc] Check that the model can be inspected upon registration (#13743 )	2025-02-25 00:18:19 -08:00
Michael Goin	4d251ad00e	Fix CompressedTensorsWNA16MoE with grouped scales (#13769 )	2025-02-25 00:17:14 -08:00
Lucas Wilkinson	4a8cfc7551	[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802 )	2025-02-24 20:33:59 -08:00
Tyler Michael Smith	1e15aaef56	[Bugfix][Quantization] Fix FP8 + EP (#13784 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-25 10:54:17 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Michael Goin	db986c19ea	Fix precommit fail in fused_moe intermediate_cache2 chunking (#13772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-24 09:25:47 -08:00
Zhonghua Deng	ccc00515fd	[BugFix] Illegal memory access for MoE On H20 (#13693 )	2025-02-24 07:37:32 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Mengqing Cao	23eca9cf68	[model][refactor] remove cuda hard code in models and layers (#13658 )	2025-02-24 06:10:14 -08:00
Isotr0py	ba5106e519	[LMM] Implement merged multimodal processor for whisper (#13278 )	2025-02-23 01:46:03 -08:00
Kyle Sayers	d5ca2110f1	[Quant] BaiChuan SupportsQuant (#13710 )	2025-02-22 19:21:15 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Helena Kloosterman	382f66fb08	[Bugfix] Fix boolean conversion for OpenVINO env variable (#13615 )	2025-02-22 08:04:12 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Yu Chin Fabian Lim	fca20841c2	Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660 )	2025-02-22 00:19:10 -08:00
Shane A	9a1f1da5d1	[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687 )	2025-02-21 22:07:45 -08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
chenxiaobing	ed6e9075d3	[Bugfix] Fix deepseekv3 grouped topk error (#13474 ) Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn>	2025-02-20 06:47:01 -08:00
燃	041e294716	[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533 )	2025-02-19 23:04:30 -08:00
Divakar Verma	0d243f2a54	[ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-02-20 04:01:02 +00:00
Jee Jee Li	512368e34a	[Misc] Qwen2.5 VL support LoRA (#13261 )	2025-02-19 18:37:55 -08:00
Kevin H. Luu	473f51cfd9	[3/n][CI] Load Quantization test models with S3 (#13570 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-20 10:12:30 +08:00
Cyrus Leung	377d10bd14	[VLM][Bugfix] Pass processor kwargs properly on init (#13516 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-19 13:13:50 +00:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
Alex Brooks	983a40a8bb	[Bugfix] Fix Positive Feature Layers in Llava Models (#13514 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-02-19 08:50:07 +00:00
Kevin H. Luu	d5d214ac7f	[1/n][CI] Load models in CI from S3 instead of HF (#13205 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-19 07:34:59 +00:00
Divakar Verma	8aada19dfc	[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )	2025-02-18 22:23:24 -08:00
Nick Hill	30172b4947	[V1] Optimize handling of sampling metadata and req_ids list (#13244 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-18 12:15:33 -08:00
Isotr0py	8cf97f8661	[Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method (#13403 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-18 10:25:53 +00:00
Michael Goin	b53d79983c	Add outlines fallback when JSON schema has enum (#13449 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-18 06:49:41 +00:00
Kyle Sayers	d1b649f1ef	[Quant] Aria SupportsQuant (#13416 )	2025-02-17 21:51:09 -08:00
Kyle Sayers	00294e1bc6	[Quant] Arctic SupportsQuant (#13366 )	2025-02-17 21:35:09 -08:00
Kyle Sayers	88787bce1d	[Quant] Molmo SupportsQuant (#13336 )	2025-02-17 21:34:47 -08:00
Isotr0py	67ef8f666a	[Model] Enable quantization support for `transformers` backend (#12960 )	2025-02-17 19:52:47 -08:00
Harry Mellor	efbe854448	[Misc] Remove dangling references to `SamplingType.BEAM` (#13402 )	2025-02-17 19:52:35 -08:00

1 2 3 4 5 ...

1376 Commits