xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 11:37:14 +08:00

Author	SHA1	Message	Date
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Cyrus Leung	96d673e0f8	[Bugfix] Fix error handling of unsupported sliding window (#11213 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 10:59:42 -07:00
Cyrus Leung	b10609e6a1	[Misc] Clean up multi-modal processor (#11207 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 06:30:28 +00:00
youkaichao	a1c02058ba	[torch.compile] allow tracking forward time (#11081 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-14 19:45:00 -08:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Sungjae Lee	886936837c	[Performance][Core] Optimize the performance of evictor v1 and v2 by applying a priority queue and lazy deletion (#7209 )	2024-12-14 11:38:10 -08:00
Mark McLoughlin	6d917d0eeb	Enable mypy checking on V1 code (#11105 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2024-12-14 09:54:04 -08:00
Cyrus Leung	93abf23a64	[VLM] Fully dynamic prompt replacement in merged input processor (#11199 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 17:52:18 +00:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Jee Jee Li	3cb5769883	[Misc] Minor improvements to the readability of PunicaWrapperBase (#11200 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-14 16:38:27 +00:00
Tyler Michael Smith	ea7bd68d10	[V1][Bugfix] Fix V1 TP trust-remote-code (#11182 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-14 08:21:23 +00:00
Russell Bryant	48259264a4	[Core] Update outlines and increase its threadpool size (#11140 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-14 07:46:18 +00:00
dhuangnm	24a3d12b82	update compressed-tensors to latest version (#11183 ) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>	2024-12-14 03:22:44 +00:00
Cody Yu	9855aea21b	[Bugfix][V1] Re-compute an entire block when fully cache hit (#11186 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-13 17:08:23 -08:00
Tyler Michael Smith	4b5b8a6a3b	[V1][Bugfix] Fix EngineCoreProc profile (#11185 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-13 17:02:35 -08:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00
Jiaxin Shan	0d8451c3a4	[Distributed] Allow the placement group more time to wait for resources to be ready (#11138 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>	2024-12-13 20:17:37 +00:00
Jani Monoses	0a56bcc03d	[Bugfix][Hardware][CPU] Enable Gemma2 with SDPA on CPU backend (#11169 )	2024-12-13 18:00:40 +00:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Alexander Matveev	238c0d93b4	[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174 ) Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2024-12-13 16:19:10 +00:00
zhangjf	5b0ed8391d	[Bugfix] using len(tokenizer) instead of tokenizer.vocab_size in AllowedTokenIdsLogitsProcessor (#11156 )	2024-12-13 15:56:19 +00:00
Sungjae Lee	c31d4a57a6	[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240 )	2024-12-13 07:51:25 -08:00
Chenguang Li	d1fa714cb1	[Refactor]A simple device-related refactor (#11163 ) Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-12-13 13:39:00 +00:00
Roger Wang	969da7d70b	[V1][VLM] Fix edge case bug for InternVL2 (#11165 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-13 11:09:30 +00:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Li, Jiang	f93bf2b189	[Bugfix][CI][CPU] add missing datasets package to requirements-cpu.txt (#11159 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-12-13 08:50:35 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
youkaichao	be39e3cd18	[core] clean up cudagraph batchsize padding logic (#10996 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-13 06:57:50 +00:00
Cody Yu	34f1a806d5	[Bugfix][V1] Fix 'NoneType' object has no attribute 'hash_value' (#11157 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-13 06:30:06 +00:00
Gregory Shtrasberg	00c1bde5d8	[ROCm][AMD] Disable auto enabling chunked prefill on ROCm (#11146 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2024-12-13 05:31:26 +00:00
Dipika Sikka	3989a79824	[Bugfix] Update starcoder2 to remap k/v scale names for kv_cache quantization (#11148 )	2024-12-13 05:07:20 +00:00
Pooya Davoodi	1efce68605	[Bugfix] Use runner_type instead of task in GritLM (#11144 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-13 04:09:53 +00:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Cody Yu	78ed8f57d8	[Misc][V1] Fix type in v1 prefix caching (#11151 )	2024-12-13 00:57:40 +00:00
shangmingc	db6c264a1e	[Bugfix] Fix value unpack error of simple connector for KVCache transfer. (#11058 ) Signed-off-by: ShangmingCai <csmthu@gmail.com>	2024-12-12 21:19:17 +00:00
Jeremy Arnold	9f3974a319	Fix logging of the vLLM Config (#11143 )	2024-12-12 12:05:57 -08:00
Cody Yu	2c97eca1ff	[Misc] Validate grammar and fail early (#11119 )	2024-12-12 18:34:26 +00:00
Jeff Cook	5d712571af	[Bugfix] Quick fix to make Pixtral-HF load correctly again after 39e227c7ae. (#11024 )	2024-12-12 18:09:20 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Roger Wang	4816d20aa4	[V1] Fix torch profiling for offline inference (#11125 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-12 15:51:53 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
youkaichao	62de37a38e	[core][distributed] initialization from StatelessProcessGroup (#10986 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-12 09:04:19 +00:00
Sanju C Sudhakaran	8195824206	[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2024-12-12 08:09:28 +00:00
Woosuk Kwon	f092153fbe	[V1] Use more persistent buffers to optimize input preparation overheads (#11111 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-11 23:14:20 -08:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Russell Bryant	ccede2b264	[Core] cleanup zmq ipc sockets on exit (#11115 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-11 19:12:24 -08:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
Simon Mo	8fb26dac61	[Docs] Add media kit (#11121 )	2024-12-11 17:33:11 -08:00
Clayton	7439a8b5fc	[Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979 ) Signed-off-by: cedonley <clayton@donley.io>	2024-12-12 01:10:12 +00:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00

1 2 3 4 5 ...

3813 Commits