xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 10:45:54 +08:00

Author	SHA1	Message	Date
Calvin Chen	48545728d8	cleanup invalid prints (#18050 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-12 23:01:57 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
Harry Mellor	c6798baa9c	Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 10:01:49 +00:00
Agata Dobrzyniewicz	843b222723	[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-05-07 22:37:03 -07:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
ponix-j	bdb2cddafc	[Misc]Use a platform independent interface to obtain the device attributes (#17100 )	2025-04-29 06:59:13 +00:00
idouba	72c5b97231	Update tpu_worker.py 's typo (#17288 )	2025-04-28 04:01:15 -07:00
Cyrus Leung	aec9674dbe	[Core] Remove legacy input mapper/processor from V0 (#15686 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-28 15:38:48 +08:00
Agata Dobrzyniewicz	c48334d405	[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-04-26 05:55:14 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Chendi.Xue	56a735261c	[INTEL-HPU][v0] Port delayed sampling to upstream (#16949 ) Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>	2025-04-22 20:14:11 -07:00
Han Zhang	d41faaf9df	Restore buffers when wake up from level 2 sleep (#16564 ) (#16889 ) Signed-off-by: Han <zh950713@gmail.com>	2025-04-21 20:18:28 +08:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
Benjamin Kitor	82eb61dd4c	[misc] use tqdm.auto where appropriate (#16290 ) Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>	2025-04-09 21:54:54 -07:00
Li, Jiang	550b2801ad	[CPU][Bugfix] Using custom allreduce for CPU backend (#15934 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-04-02 07:46:47 -07:00
Eric Tang	ddb94c2605	[core] Add tags parameter to wake_up() (#15500 ) Signed-off-by: Eric <erictang000@gmail.com>	2025-04-02 01:59:27 -07:00
Thien Tran	2edc87b161	[Bugfix] Fix cache block size calculation for CPU MLA (#15848 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-04-02 01:45:02 -07:00
yihong	2de4118243	fix: change GB to GiB in logging close #14979 (#15807 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-31 10:00:50 -07:00
Chengji Yao	e74ff409e0	[TPU] support disabling xla compilation cache (#15567 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-27 00:09:28 +00:00
Varun Sundar Rabindranath	6c663dfd5e	[misc] LoRA - Skip LoRA kernels when not required (#15152 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-26 11:33:45 +08:00
Thien Tran	4f044b1d67	[Kernel][CPU] CPU MLA (#14744 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-25 09:34:59 +00:00
liuzhenwei	5eeadc2642	[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303 ) Signed-off-by: zhenwei <zhenweiliu@habana.ai>	2025-03-24 09:48:40 -07:00
Russell Bryant	b877031d80	Remove openvino support in favor of external plugin (#15339 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-22 14:06:39 -07:00
Cyrus Leung	f6137adbcb	Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) (#14892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:13:46 -07:00
Kyle Sayers	d30aa7e9e6	[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-16 07:44:19 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00
yarongmu-google	dd344e0342	[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-15 00:41:15 +00:00
Jee Jee Li	b8b0ccbd2d	[Bugfix] Make the deviceprofiler include LoRA memory. (#14469 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-08 07:12:22 +00:00
Harry Mellor	f7a6bd0fa1	Fix missing `kv_caches` and `attn_metadata` in `OpenVINOCausalLM` (#14271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-07 12:30:42 +00:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Siyuan Liu	beebf4742a	[TPU][Profiler] Support start_profile/stop_profile in TPU worker (#13988 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-04 14:40:06 -05:00
Zhanwen Chen	66233af7b6	Use math.prod instead of np.prod for trivial ops (#14142 )	2025-03-03 21:09:22 -08:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Kacper Pietkun	b91660ddb8	[Hardware][Intel-Gaudi] Regional compilation support (#13213 )	2025-02-28 00:51:49 -08:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Yang Zheng	4b1d141f49	[PP] Correct cache size check (#13873 ) Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>	2025-02-27 17:47:29 +08:00
Joe Runde	3f808cc044	[Bugfix] Do not crash V0 engine on input errors (#13101 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 19:07:29 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00

1 2 3 4 5 ...

473 Commits