xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-01 03:27:53 +08:00

Author	SHA1	Message	Date
Divakar Verma	bf21481dde	[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) \| fp16, fp8 (#12408 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-25 12:17:19 +08:00
Cyrus Leung	fb30ee92ee	[Bugfix] Fix BLIP-2 processing (#12412 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-25 11:42:42 +08:00
Cyrus Leung	df5dafaa5b	[Misc] Remove deprecated code (#12383 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-24 14:45:20 -05:00
Lucas Wilkinson	ab5bbf5ae3	[Bugfix][Kernel] Fix CUDA 11.8 being broken by FA3 build (#12375 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-01-24 15:27:59 +00:00
youkaichao	6dd94dbe94	[perf] fix perf regression from #12253 (#12380 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 11:34:27 +08:00
Woosuk Kwon	0e74d797ce	[V1] Increase default batch size for H100/H200 (#12369 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-24 03:19:55 +00:00
omer-dayan	5e5630a478	[Bugfix] Path join when building local path for S3 clone (#12353 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>	2025-01-24 11:06:07 +08:00
Russell Bryant	d3d6bb13fb	Set weights_only=True when using torch.load() (#12366 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 02:17:30 +00:00
Nick Hill	24b0205f58	[V1][Frontend] Coalesce bunched `RequestOutput`s (#12298 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2025-01-23 17:17:41 -08:00
Dipika Sikka	eb5cb5e528	[BugFix] Fix parameter names and `process_after_weight_loading` for W4A16 MoE Group Act Order (#11528 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-23 21:40:33 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
youkaichao	6e650f56a1	[torch.compile] decouple compile sizes and cudagraph sizes (#12243 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 02:01:30 +08:00
youkaichao	3f50c148fd	[core] add wake_up doc and some sanity check (#12361 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 02:00:50 +08:00
Isotr0py	8c01b8022c	[Bugfix] Fix broken internvl2 inference with v1 (#12360 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-23 17:20:33 +00:00
Roger Wang	99d01a5e3d	[V1] Simplify M-RoPE (#12352 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: imkero <kerorek@outlook.com>	2025-01-23 23:13:23 +08:00
Lucas Wilkinson	978b45f399	[Kernel] Flash Attention 3 Support (#12093 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-01-23 06:45:48 -08:00
Isotr0py	c5b4b11d7f	[Bugfix] Fix k_proj's bias for whisper self attention (#12342 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-23 10:15:33 +00:00
Cody Yu	f0ef37233e	[V1] Add `uncache_blocks` (#12333 )	2025-01-23 04:19:21 +00:00
rasmith	68c4421b6d	[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-01-23 00:10:37 +00:00
Nick Hill	aea94362c9	[Frontend][V1] Online serving performance improvements (#12287 )	2025-01-22 22:22:12 +00:00
Cody Yu	7206ce4ce1	[Core] Support `reset_prefix_cache` (#12284 )	2025-01-22 18:52:27 +00:00
Konrad Zawora	96f6a7596f	[Bugfix] Fix HPU multiprocessing executor (#12167 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-01-23 02:07:07 +08:00
Jee Jee Li	84bee4bd5c	[Misc] Improve the readability of BNB error messages (#12320 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-22 16:56:54 +00:00
Robin	fc66dee76d	[Misc] Fix the error in the tip for the --lora-modules parameter (#12319 ) Signed-off-by: wangerxiao <863579016@qq.com>	2025-01-22 16:48:41 +00:00
Cyrus Leung	6609cdf019	[Doc] Add docs for prompt replacement (#12318 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-22 14:56:29 +00:00
Roger Wang	16366ee8bb	[Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 (#12313 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-22 21:06:36 +08:00
zhou fan	528dbcac7d	[Model][Bugfix]: correct Aria model output (#12309 ) Signed-off-by: xffxff <1247714429@qq.com>	2025-01-22 11:39:19 +00:00
Cyrus Leung	cd7b6f0857	[VLM] Avoid unnecessary tokenization (#12310 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-22 11:08:31 +00:00
youkaichao	68ad4e3a8d	[Core] Support fully transparent sleep mode (#11743 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:39:32 +08:00
youkaichao	66818e5b63	[core] separate builder init and builder prepare for each batch (#12253 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:13:52 +08:00
Cyrus Leung	cbdc4ad5a5	[Ci/Build] Fix mypy errors on main (#12296 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-22 12:06:54 +08:00
Kevin H. Luu	64ea24d0b3	[ci/lint] Add back default arg for pre-commit (#12279 ) Signed-off-by: kevin <kevin@anyscale.com>	2025-01-22 01:15:27 +00:00
Cyrus Leung	df76e5af26	[VLM] Simplify post-processing of replacement info (#12269 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 16:48:13 -08:00
Aleksandr Malyshev	69196a9bc7	[BUGFIX] When skip_tokenize_init and multistep are set, execution crashes (#12277 ) Signed-off-by: maleksan85 <maleksan@amd.com> Co-authored-by: maleksan85 <maleksan@amd.com>	2025-01-21 23:30:46 +00:00
Jani Monoses	9c485d9e25	[Core] Free CPU pinned memory on environment cleanup (#10477 )	2025-01-21 11:56:41 -08:00
wangxiyuan	fa9ee08121	[Misc] Set default backend to SDPA for get_vit_attn_backend (#12235 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-01-21 11:52:11 -08:00
Adrian Cole	347eeebe3b	[Misc] Remove experimental dep from tracing.py (#12007 ) Signed-off-by: Adrian Cole <adrian.cole@elastic.co>	2025-01-21 11:51:55 -08:00
Andy Lo	18fd4a8331	[Bugfix] Multi-sequence broken (#11898 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-01-21 11:51:35 -08:00
Ricky Xu	132a132100	[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907 ) Signed-off-by: rickyx <rickyx@anyscale.com>	2025-01-21 11:51:13 -08:00
Jannis Schönleber	9705b90bcf	[Bugfix] fix race condition that leads to wrong order of token returned (#10802 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com>	2025-01-21 09:47:04 -08:00
Mengqing Cao	c64612802b	[Platform] improve platforms getattr (#12264 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-01-21 14:42:41 +00:00
Roger Wang	b197a5ccfd	[V1][Bugfix] Fix data item ordering in mixed-modality inference (#12259 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-21 13:18:43 +00:00
youkaichao	c81081fece	[torch.compile] transparent compilation with more logging (#12246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-21 19:32:55 +08:00
Cyrus Leung	a94eee4456	[Bugfix] Fix mm_limits access for merged multi-modal processor (#12252 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 10:09:39 +00:00
Cyrus Leung	f2e9f2a3be	[Misc] Remove redundant TypeVar from base model (#12248 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 08:40:39 +00:00
Jee Jee Li	1f1542afa9	[Misc]Add BNB quantization for PaliGemmaForConditionalGeneration (#12237 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-21 07:49:08 +00:00
Cyrus Leung	96912550c8	[Misc] Rename `MultiModalInputsV2 -> MultiModalInputs` (#12244 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-21 07:31:19 +00:00
Nicolò Lucchesi	5fe6bf29d6	[BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-01-21 12:23:14 +08:00
Gregory Shtrasberg	d4b62d4641	[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-01-21 12:22:23 +08:00
Jinzhen Lin	750f4cabfa	[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-20 16:42:16 -08:00

1 2 3 4 5 ...

2987 Commits