xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 00:55:01 +08:00

Author	SHA1	Message	Date
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
Kevin H. Luu	34ad27fe83	[ci] Fix metrics test model path (#13635 )	2025-02-20 22:12:10 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Lingfan Yu	33170081f1	[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245 ) Signed-off-by: Lingfan Yu <lingfany@amazon.com>	2025-02-20 17:45:45 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Harry Mellor	992e5c3d34	Merge similar examples in `offline_inference` into single `basic` example (#12737 )	2025-02-20 04:53:51 -08:00
Kevin H. Luu	a64a84433d	[2/n][ci] S3: Use full model path (#13564 ) Signed-off-by: <>	2025-02-20 01:20:15 -08:00
Kevin H. Luu	aa1e62d0db	[ci] Fix spec decode test (#13600 )	2025-02-20 16:56:00 +08:00
youkaichao	ba81163997	[core] add sleep and wake up endpoint and v1 support (#12987 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: cennn <2523403608@qq.com> Co-authored-by: cennn <2523403608@qq.com>	2025-02-20 12:41:17 +08:00
Jee Jee Li	512368e34a	[Misc] Qwen2.5 VL support LoRA (#13261 )	2025-02-19 18:37:55 -08:00
Kevin H. Luu	473f51cfd9	[3/n][CI] Load Quantization test models with S3 (#13570 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-20 10:12:30 +08:00
Cyrus Leung	377d10bd14	[VLM][Bugfix] Pass processor kwargs properly on init (#13516 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-19 13:13:50 +00:00
Yannick Schnider	423330263b	[Feature] Pluggable platform-specific scheduler (#13161 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>	2025-02-19 17:16:38 +08:00
Nick Hill	caf7ff4456	[V1][Core] Generic mechanism for handling engine utility (#13060 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-19 17:09:22 +08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
Alex Brooks	983a40a8bb	[Bugfix] Fix Positive Feature Layers in Llava Models (#13514 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-02-19 08:50:07 +00:00
Kevin H. Luu	d5d214ac7f	[1/n][CI] Load models in CI from S3 instead of HF (#13205 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-19 07:34:59 +00:00
Nick Hill	30172b4947	[V1] Optimize handling of sampling metadata and req_ids list (#13244 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-18 12:15:33 -08:00
Murali Andoorveedu	a4d577b379	[V1][Tests] Adding additional testing for multimodal models to V1 (#13308 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>	2025-02-18 09:53:14 -08:00
Liangfu Chen	3809458456	[Bugfix] Fix invalid rotary embedding unit test (#13431 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-02-18 11:52:03 +00:00
Michael Goin	b53d79983c	Add outlines fallback when JSON schema has enum (#13449 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-18 06:49:41 +00:00
Isotr0py	67ef8f666a	[Model] Enable quantization support for `transformers` backend (#12960 )	2025-02-17 19:52:47 -08:00
Woosuk Kwon	cd4a72a28d	[V1][Spec decode] Move drafter to model runner (#13363 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-17 15:40:12 -08:00
Woosuk Kwon	4c21ce9eba	[V1] Get input tokens from scheduler (#13339 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-17 11:01:07 -08:00
Tyler Michael Smith	1f69c4a892	[Model] Support Mamba2 (Codestral Mamba) (#9292 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-02-17 20:17:50 +08:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Cyrus Leung	5d2965b7d7	[Bugfix] Fix 2 Node and Spec Decode tests (#13341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-16 22:20:22 +08:00
youkaichao	124776ebd5	[ci] skip failed tests for flashinfer (#13352 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-16 22:09:15 +08:00
wchen61	dc0f7ccf8b	[BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187 ) Signed-off-by: wchen61 <wchen61@foxmail.com>	2025-02-16 08:59:49 +00:00
Lily Liu	80f63a3966	[V1][Spec Decode] Ngram Spec Decode (#12193 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-15 18:05:11 -08:00
Cody Yu	9206b3d7ec	[V1][PP] Run engine busy loop with batch queue (#13064 )	2025-02-15 03:59:01 -08:00
Mark McLoughlin	2ad1bc7afe	[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288 )	2025-02-15 03:56:19 -08:00
Woosuk Kwon	e7eea5a520	[V1][CI] Fix failed v1-test because of min_p (#13316 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-14 17:29:51 -08:00
Aoyu	a12934d3ec	[V1][Core] min_p sampling support (#13191 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com>	2025-02-14 15:50:05 -08:00
Joe Runde	3bcb8c75da	[Core] Reduce TTFT with concurrent partial prefills (#10235 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-14 15:36:07 -08:00
Michael Goin	5e5c8e091e	[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-14 12:53:42 -08:00
Lu Fang	6224a9f620	Support logit_bias in v1 Sampler (#13079 )	2025-02-14 04:34:59 -08:00
Alexander Matveev	45f90bcbba	[WIP] TPU V1 Support Refactored (#13049 )	2025-02-14 00:21:53 -08:00
Kero Liang	b0ccfc565a	[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126 )	2025-02-13 22:39:20 -08:00
Varun Sundar Rabindranath	cbc40128eb	[V1] LoRA - Enable Serving Usecase (#12883 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-14 14:21:12 +08:00
Harry Mellor	f2b20fe491	Consolidate Llama model usage in tests (#13094 )	2025-02-13 22:18:03 -08:00
Tyler Michael Smith	09545c0a94	[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250 )	2025-02-13 20:19:25 -08:00
Tyler Michael Smith	c1e37bf71b	[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-14 00:01:14 +00:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Vaibhav Jain	37dfa60037	[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193 )	2025-02-13 06:52:22 -08:00

... 18 19 20 21 22 ...

2360 Commits