xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-08 23:27:54 +08:00

Author	SHA1	Message	Date
Li, Jiang	280d074103	[CPU][CI] Improve CPU Dockerfile (#15690 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-28 01:36:31 -07:00
Ce Gao	32b14baf8a	[Refactor][Frontend] Keep all logic about reasoning into one class (#14428 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-28 00:23:30 -07:00
Robert Shaw	2d9045fce8	[TPU][CI] Fix TPUModelRunner Test (#15667 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-03-28 00:01:26 -07:00
Cyrus Leung	355f66348c	[V1] Remove legacy input registry (#15673 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 23:34:34 -07:00
Cyrus Leung	8693e47e6a	[Bugfix] Fix `mm_hashes` forgetting to be passed (#15668 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-28 05:51:05 +00:00
Jason (Siyu) Zhu	cec8c7d7f8	Refactor error handling for multiple exceptions in preprocessing (#15650 ) Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>	2025-03-28 03:27:20 +00:00
Gregory Shtrasberg	4d0ec37267	[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-03-28 02:58:16 +00:00
Chen Xia	e7f720ea56	[Misc]add coding benchmark for speculative decoding (#15303 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-03-28 10:47:05 +08:00
Wes	4ae17bf1e2	Revert "Use Cache Hinting for fused_moe kernel (#15511 )" (#15645 ) Signed-off-by: Wes Medford <wryanmedford@gmail.com>	2025-03-27 19:45:55 -07:00
Robert Shaw	8a49eea74b	[CI][TPU] Temporarily Disable Quant Test on TPU (#15649 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-27 19:45:05 -07:00
wwl2755	b4245a48df	[Doc] Fix dead links in Job Board (#15637 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-03-28 02:43:40 +00:00
Kebe	4e0f6076be	[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-28 10:13:41 +08:00
Jee Jee Li	726efc6a32	[Quantization][V1] BitsAndBytes support V1 (#15611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-28 10:12:47 +08:00
Robert Shaw	bd45912b99	[TPU] Lazy Import (#15656 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-28 09:57:01 +08:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
Russell Bryant	112b3e5b3b	[CI] Update rules for applying `tpu` label. (#15634 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-27 22:15:26 +00:00
cnorman	32d669275b	Correct PowerPC to modern IBM Power (#15635 ) Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>	2025-03-27 15:04:32 -07:00
Nicolò Lucchesi	4098b72210	[Bugfix][TPU][V1] Fix recompilation (#15553 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-27 19:15:06 +00:00
Harry Mellor	46450b8d33	Use absolute placement for Ask AI button (#15628 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-27 18:52:18 +00:00
Cyrus Leung	13ac9cab21	[Misc] Avoid direct access of global `mm_registry` in `compute_encoder_budget` (#15621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 17:52:00 +00:00
Yuan Tang	66aa4c0bf4	[Feature] Add middleware to log API Server responses (#15593 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-27 17:49:38 +00:00
Cyrus Leung	247181536f	[Misc] Replace `is_encoder_decoder_inputs` with `split_enc_dec_inputs` (#15620 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 17:36:32 +00:00
Cyrus Leung	07bf813fb5	[Doc] Link to onboarding tasks (#15629 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 16:30:53 +00:00
Hiroaki Sugiyama	8958217ad5	[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211 ) Signed-off-by: h-sugi <h.sugi@ieee.org> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 22:29:29 +08:00
Cyrus Leung	ac5bc615b0	[Model] MiniCPM-V/O supports V1 (#15487 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 06:07:29 -07:00
Reid	8063dfc61a	[Doc] update --system for transformers installation in docker doc (#15616 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-03-27 20:38:46 +08:00
Richard Zou	6278bc829e	Fix incorrect filenames in vllm_compile_cache.py (#15494 ) Signed-off-by: <zou3519@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-27 18:33:41 +08:00
wang.yuqi	3f532cb6a6	[Misc] Use model_redirect to redirect the model name to a local folder. (#14116 )	2025-03-27 02:21:23 -07:00
Cyrus Leung	e6c9053f9e	[Misc] Clean up `scatter_patch_features` (#15559 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-27 07:45:00 +00:00
Robert Shaw	43ed4143c4	[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-03-27 06:47:25 +00:00
Bella kira	f4c98b4d4c	[Misc] Consolidate LRUCache implementations (#15481 ) Signed-off-by: Bella kira <2374035698@qq.com>	2025-03-27 06:43:43 +00:00
Robert Shaw	e1e0fd7543	[TPU] Avoid Triton Import (#15589 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-27 06:43:02 +00:00
Rui Qiao	df8d3d1287	[Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556 )	2025-03-27 06:21:07 +00:00
Chengji Yao	619d3de8bd	[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-26 22:46:26 -07:00
Gregory Shtrasberg	ecff8309a3	[ROCm] Env variable to trigger custom PA (#15557 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-03-26 22:46:12 -07:00
Jerry Zhang	dcf2a590f5	Allow torchao quantization in SiglipMLP (#15575 )	2025-03-26 22:45:51 -07:00
Cody Yu	54aa619459	[V1] Refactor num_computed_tokens logic (#15307 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 04:54:36 +00:00
Mengqing Cao	fb22be5817	[moe][quant] add weight name case for offset (#15515 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-27 04:50:29 +00:00
Wei Zeng	7f301dd8ef	[Doc] Update V1 user guide for fp8 kv cache support (#15585 ) Signed-off-by: weizeng <weizeng@roblox.com>	2025-03-26 19:39:03 -07:00
Varun Sundar Rabindranath	8095341a01	[misc] LoRA: Remove unused long context test data (#15558 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-27 10:04:51 +08:00
Chenyaaang	69db16a46a	add platform check back (#15578 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com>	2025-03-27 01:50:27 +00:00
Michael Goin	ce78f9af4e	Add automatic tpu label to mergify.yml (#15560 )	2025-03-26 21:39:58 -04:00
ElizaWszola	9239bf718e	[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>	2025-03-27 00:54:44 +00:00
Matthew Vine	7a6d45bc8a	Support FIPS enabled machines with MD5 hashing (#15299 ) Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>	2025-03-26 20:19:46 -04:00
Chengji Yao	e74ff409e0	[TPU] support disabling xla compilation cache (#15567 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-27 00:09:28 +00:00
Wes	7a888271f5	Use Cache Hinting for fused_moe kernel (#15511 )	2025-03-26 23:21:34 +00:00
Alexander Matveev	9d119a86ae	[V1] TPU CI - Fix test_compilation.py (#15570 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-26 21:51:54 +00:00
Alexander Matveev	b2e85e26f4	[V1] TPU - Revert to exponential padding by default (#15565 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-26 21:35:05 +00:00
Alexei-V-Ivanov-AMD	dd8a29da99	Applying some fixes for K8s agents in CI (#15493 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-03-26 20:35:11 +00:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00

1 2 3 4 5 ...

5567 Commits