xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-25 23:07:25 +08:00

Author	SHA1	Message	Date
John Giorgi	82c49d3260	[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 22:15:58 -08:00
Michael Goin	74fa1d123c	[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-31 03:43:54 +00:00
Matthias Vogler	a2a40bcd0d	[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439 ) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 17:33:06 -08:00
Kevin H. Luu	ccb1aabcca	[benchmark] Remove dependency for H100 benchmark step (#11572 )	2024-12-30 12:27:07 -08:00
whyiug	36e7670045	[Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631 )	2024-12-30 18:51:04 +00:00
Robert Shaw	5886aa496e	[V1] [6/N] API Server: Better Shutdown (#11586 )	2024-12-30 15:51:02 +00:00
Cyrus Leung	8d9b6721e7	[VLM] Abstract out multi-modal data parsing in merged processor (#11620 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-30 15:01:35 +00:00
youkaichao	b12e87f942	[platforms] enable platform plugins (#11602 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 20:24:45 +08:00
Li, Jiang	5dbf854553	[CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-12-30 10:17:04 +00:00
Tyler Michael Smith	970d6d0776	[Build][Kernel] Update CUTLASS to v3.6.0 (#11607 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-30 17:22:13 +08:00
Liangfu Chen	628ec6c17b	[Docker] bump up neuron sdk v2.21 (#11593 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2024-12-30 13:46:14 +08:00
youkaichao	3682e33f9f	[v1] fix compilation cache (#11598 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-30 04:24:12 +00:00
Michael Goin	0aa38d16f5	Remove print statement in DeepseekScalingRotaryEmbedding (#11604 )	2024-12-29 20:16:46 +00:00
Kuntai Du	faef77c0d6	[Misc] KV cache transfer connector registry (#11481 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2024-12-29 16:08:09 +00:00
youkaichao	dba4d9dec6	[v1][bugfix] fix cudagraph with inplace buffer assignment (#11596 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-29 09:03:49 +00:00
Cyrus Leung	32b4c63f02	[Doc] Convert list tables to MyST (#11594 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-29 15:56:22 +08:00
Robert Shaw	4fb8e329fd	[V1] [5/N] API Server: unify `Detokenizer` and `EngineCore` input (#11545 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-28 20:51:57 +00:00
youkaichao	328841d002	[bugfix] interleaving sliding window for cohere2 model (#11583 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-28 16:55:42 +00:00
Cyrus Leung	d427e5cfda	[Doc] Minor documentation fixes (#11580 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-28 21:53:59 +08:00
Woosuk Kwon	42bb201fd6	[V1][Minor] Set pin_memory=False for token_ids_cpu tensor (#11581 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-28 13:33:12 +00:00
hj-wei	59d6bb4c86	[Hardware][AMD]: Replace HIPCC version with more precise ROCm version (#11515 ) Signed-off-by: hjwei <hjwei_xd@163.com>	2024-12-28 11:17:35 +00:00
Roger Wang	b7dcc003dc	[Model] Remove hardcoded image tokens ids from Pixtral (#11582 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-28 10:54:23 +00:00
Isotr0py	d34be24bb1	[Model] Support InternLM2 Reward models (#11571 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-28 06:14:10 +00:00
Rajveer Bachkaniwala	b5cbe8eeb3	[Bugfix] Last token measurement fix (#11376 ) Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-12-28 11:34:46 +08:00
Robert Shaw	df04dffade	[V1] [4/N] API Server: ZMQ/MP Utilities (#11541 )	2024-12-28 01:45:08 +00:00
Chen Zhang	a60731247f	[Doc] Update mllama example based on official doc (#11567 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2024-12-28 00:31:10 +00:00
Selali	ac79799403	[Bugfix] Fix for ROCM compressed tensor support (#11561 )	2024-12-27 20:12:11 +00:00
Isotr0py	dde1fa18c9	[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-27 19:45:13 +00:00
Jee Jee Li	0240402c46	[Misc]Add BNB quantization for MolmoForCausalLM (#11551 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-27 18:48:24 +00:00
ErezSC42	55509c2114	[MODEL] LoRA support for Jamba model (#11209 ) Signed-off-by: Erez Schwartz <erezs@ai21.com>	2024-12-27 17:58:21 +00:00
Cyrus Leung	101418096f	[VLM] Support caching in merged multi-modal processor (#11396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 17:22:48 +00:00
Chen1022	5ce4627a7e	[Doc] Add xgrammar in doc (#11549 ) Signed-off-by: ccjincong <chenjincong11@gmail.com>	2024-12-27 13:05:10 +00:00
Cyrus Leung	7af553ea30	[Misc] Abstract the logic for reading and writing media content (#11527 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-27 19:21:23 +08:00
Jee Jee Li	2c9b8ea2b0	[Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-27 10:39:15 +00:00
AlexHe99	d003f3ea39	Update deploying_with_k8s.md with AMD ROCm GPU example (#11465 ) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-27 10:00:04 +00:00
Mengqing Cao	6c6f7fe8a8	[Platform] Move model arch check to platform (#11503 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2024-12-27 08:45:25 +00:00
Robert Shaw	2339d59f92	[BugFix] Fix quantization for all other methods (#11547 ) v0.6.6.post1	2024-12-26 22:23:29 -08:00
Robert Shaw	1b875a0ef3	[V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (#11534 )	2024-12-26 21:19:21 -08:00
youkaichao	eb881ed006	[misc] fix typing (#11540 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-27 11:05:08 +08:00
Robert Shaw	46d4359450	[CI] Fix broken CI (#11543 )	2024-12-26 18:49:16 -08:00
Woosuk Kwon	81b979f2a8	[V1] Fix yapf (#11538 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-27 09:47:10 +09:00
Woosuk Kwon	371d04d39b	[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-27 09:32:38 +09:00
Robert Shaw	0c0c2015c5	Update openai_compatible_server.md (#11536 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-12-26 16:26:18 -08:00
Simon Mo	82d24f7aac	[Docs] Document Deepseek V3 support (#11535 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 16:21:56 -08:00
Simon Mo	f49777ba62	Deepseek v3 (#11502 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com> v0.6.6	2024-12-26 16:09:44 -08:00
Robert Shaw	55fb97f7bd	[2/N] API Server: Avoid ulimit footgun (#11530 )	2024-12-26 23:43:05 +00:00
Michael Goin	2072924d14	[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 15:33:30 -08:00
Robert Shaw	720b10fdc6	[1/N] API Server (Remove Proxy) (#11529 )	2024-12-26 23:03:43 +00:00
Isotr0py	b85a977822	[Doc] Add video example to openai client for multimodal (#11521 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-26 17:31:29 +00:00
Cyrus Leung	eec906d811	[Misc] Add placeholder module (#11501 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-26 13:12:51 +00:00

1 2 3 4 5 ...

3965 Commits