xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 10:57:15 +08:00

Author	SHA1	Message	Date
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
Cyrus Leung	696259ca01	[Core] Automatically cast multi-modal input dtype (#18756 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-27 23:45:48 +08:00
Hyogeun Oh (오효근)	a68e293cb9	[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-27 01:44:20 -07:00
Cyrus Leung	7d9216495c	[Doc] Update references to doc files (#18637 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 15:49:21 -07:00
youkaichao	6a7988c55b	Refactor pplx init logic to make it modular (prepare for deepep) (#18200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-23 23:43:43 +08:00
Harry Mellor	a1fe24d961	Migrate docs from Sphinx to MkDocs (#18145 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 02:09:53 -07:00
燃	f6037d1907	[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18526 ) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-22 05:22:53 -07:00
Cyrus Leung	ad0012a0ac	Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407 )" (#18456 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-20 22:39:22 -07:00
燃	be48360c1f	[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407 ) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>	2025-05-20 06:59:48 -07:00
Nan Qin	9609327fa4	[Core] [Bugfix]: tensor parallel with prompt embeds (#18171 ) Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: Andrew Sansom <andrew@protopia.ai>	2025-05-19 20:21:27 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Calvin Chen	48545728d8	cleanup invalid prints (#18050 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-12 23:01:57 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
Harry Mellor	c6798baa9c	Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 10:01:49 +00:00
Agata Dobrzyniewicz	843b222723	[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-05-07 22:37:03 -07:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
ponix-j	bdb2cddafc	[Misc]Use a platform independent interface to obtain the device attributes (#17100 )	2025-04-29 06:59:13 +00:00
idouba	72c5b97231	Update tpu_worker.py 's typo (#17288 )	2025-04-28 04:01:15 -07:00
Cyrus Leung	aec9674dbe	[Core] Remove legacy input mapper/processor from V0 (#15686 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-28 15:38:48 +08:00
Agata Dobrzyniewicz	c48334d405	[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-04-26 05:55:14 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Chendi.Xue	56a735261c	[INTEL-HPU][v0] Port delayed sampling to upstream (#16949 ) Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>	2025-04-22 20:14:11 -07:00
Han Zhang	d41faaf9df	Restore buffers when wake up from level 2 sleep (#16564 ) (#16889 ) Signed-off-by: Han <zh950713@gmail.com>	2025-04-21 20:18:28 +08:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
Benjamin Kitor	82eb61dd4c	[misc] use tqdm.auto where appropriate (#16290 ) Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>	2025-04-09 21:54:54 -07:00
Li, Jiang	550b2801ad	[CPU][Bugfix] Using custom allreduce for CPU backend (#15934 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-04-02 07:46:47 -07:00
Eric Tang	ddb94c2605	[core] Add tags parameter to wake_up() (#15500 ) Signed-off-by: Eric <erictang000@gmail.com>	2025-04-02 01:59:27 -07:00
Thien Tran	2edc87b161	[Bugfix] Fix cache block size calculation for CPU MLA (#15848 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-04-02 01:45:02 -07:00
yihong	2de4118243	fix: change GB to GiB in logging close #14979 (#15807 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-31 10:00:50 -07:00
Chengji Yao	e74ff409e0	[TPU] support disabling xla compilation cache (#15567 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-27 00:09:28 +00:00
Varun Sundar Rabindranath	6c663dfd5e	[misc] LoRA - Skip LoRA kernels when not required (#15152 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-26 11:33:45 +08:00
Thien Tran	4f044b1d67	[Kernel][CPU] CPU MLA (#14744 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-25 09:34:59 +00:00
liuzhenwei	5eeadc2642	[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303 ) Signed-off-by: zhenwei <zhenweiliu@habana.ai>	2025-03-24 09:48:40 -07:00
Russell Bryant	b877031d80	Remove openvino support in favor of external plugin (#15339 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-22 14:06:39 -07:00
Cyrus Leung	f6137adbcb	Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) (#14892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:13:46 -07:00
Kyle Sayers	d30aa7e9e6	[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-16 07:44:19 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00

1 2 3 4 5 ...

485 Commits