xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-29 20:57:18 +08:00

Author	SHA1	Message	Date
Nick Hill	30172b4947	[V1] Optimize handling of sampling metadata and req_ids list (#13244 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-18 12:15:33 -08:00
Murali Andoorveedu	a4d577b379	[V1][Tests] Adding additional testing for multimodal models to V1 (#13308 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>	2025-02-18 09:53:14 -08:00
Liangfu Chen	3809458456	[Bugfix] Fix invalid rotary embedding unit test (#13431 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-02-18 11:52:03 +00:00
Michael Goin	b53d79983c	Add outlines fallback when JSON schema has enum (#13449 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-18 06:49:41 +00:00
Isotr0py	67ef8f666a	[Model] Enable quantization support for `transformers` backend (#12960 )	2025-02-17 19:52:47 -08:00
Woosuk Kwon	cd4a72a28d	[V1][Spec decode] Move drafter to model runner (#13363 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-17 15:40:12 -08:00
Woosuk Kwon	4c21ce9eba	[V1] Get input tokens from scheduler (#13339 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-17 11:01:07 -08:00
Tyler Michael Smith	1f69c4a892	[Model] Support Mamba2 (Codestral Mamba) (#9292 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-02-17 20:17:50 +08:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Cyrus Leung	5d2965b7d7	[Bugfix] Fix 2 Node and Spec Decode tests (#13341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-16 22:20:22 +08:00
youkaichao	124776ebd5	[ci] skip failed tests for flashinfer (#13352 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-16 22:09:15 +08:00
wchen61	dc0f7ccf8b	[BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187 ) Signed-off-by: wchen61 <wchen61@foxmail.com>	2025-02-16 08:59:49 +00:00
Lily Liu	80f63a3966	[V1][Spec Decode] Ngram Spec Decode (#12193 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-15 18:05:11 -08:00
Cody Yu	9206b3d7ec	[V1][PP] Run engine busy loop with batch queue (#13064 )	2025-02-15 03:59:01 -08:00
Mark McLoughlin	2ad1bc7afe	[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288 )	2025-02-15 03:56:19 -08:00
Woosuk Kwon	e7eea5a520	[V1][CI] Fix failed v1-test because of min_p (#13316 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-14 17:29:51 -08:00
Aoyu	a12934d3ec	[V1][Core] min_p sampling support (#13191 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com>	2025-02-14 15:50:05 -08:00
Joe Runde	3bcb8c75da	[Core] Reduce TTFT with concurrent partial prefills (#10235 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-14 15:36:07 -08:00
Michael Goin	5e5c8e091e	[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-14 12:53:42 -08:00
Lu Fang	6224a9f620	Support logit_bias in v1 Sampler (#13079 )	2025-02-14 04:34:59 -08:00
Alexander Matveev	45f90bcbba	[WIP] TPU V1 Support Refactored (#13049 )	2025-02-14 00:21:53 -08:00
Kero Liang	b0ccfc565a	[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126 )	2025-02-13 22:39:20 -08:00
Varun Sundar Rabindranath	cbc40128eb	[V1] LoRA - Enable Serving Usecase (#12883 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-14 14:21:12 +08:00
Harry Mellor	f2b20fe491	Consolidate Llama model usage in tests (#13094 )	2025-02-13 22:18:03 -08:00
Tyler Michael Smith	09545c0a94	[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250 )	2025-02-13 20:19:25 -08:00
Tyler Michael Smith	c1e37bf71b	[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-14 00:01:14 +00:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Vaibhav Jain	37dfa60037	[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193 )	2025-02-13 06:52:22 -08:00
Cyrus Leung	1bc3b5e71b	[VLM] Separate text-only and vision variants of the same model architecture (#13157 )	2025-02-13 06:19:15 -08:00
Cyrus Leung	c9d3ecf016	[VLM] Merged multi-modal processor for Molmo (#12966 )	2025-02-13 04:34:00 -08:00
Rui Qiao	9605c1256e	[V1][core] Implement pipeline parallel on Ray (#12996 )	2025-02-13 08:02:46 +00:00
LikeSundayLikeRain	04f50ad9d1	[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097 )	2025-02-12 23:11:26 -08:00
Isotr0py	bc55d13070	[VLM] Implement merged multimodal processor for Mllama (#11427 )	2025-02-12 20:26:21 -08:00
Kaixi Hou	4fc5c23bb6	[NVIDIA] Support nvfp4 quantization (#12784 )	2025-02-12 19:51:51 -08:00
Michael Goin	14b7899d10	[CI] Fix failing FP8 cpu offload test (#13170 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-12 19:16:06 +00:00
Qubitium-ModelCloud	36a08630e8	[CORE] [QUANT] Support for GPTQModel's `dynamic` quantization per module override/control (#7086 )	2025-02-12 09:19:43 -08:00
Jee Jee Li	82cabf53a3	[Misc] Delete unused LoRA modules (#13151 )	2025-02-12 08:58:24 -08:00
Rafael Vasquez	314cfade02	[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral` (#12332 )	2025-02-12 08:29:56 -08:00
Lingfan Yu	e92694b6fe	[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921 ) Signed-off-by: Lingfan Yu <lingfany@amazon.com>	2025-02-11 21:12:37 -08:00
Christian Pinto	974dfd4971	[Model] IBM/NASA Prithvi Geospatial model (#12830 )	2025-02-11 20:34:30 -08:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	6c4dbe23eb	[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-02-12 00:21:50 +08:00
Mark McLoughlin	75e6e14516	[V1][Metrics] Add several request timing histograms (#12644 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-11 10:14:00 -05:00
மனோஜ்குமார் பழனிச்சாமி	110f59a33e	[Bugfix] fix flaky test (#13089 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-11 14:41:20 +00:00
Cody Yu	41c5dd45b9	[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592 )	2025-02-11 08:27:25 +00:00
Ce Gao	fc6485d277	[Bugfix]: Reasoning output bug according to the chat template change (#13025 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-02-11 15:49:03 +08:00
Varun Sundar Rabindranath	78a141d768	[Misc] LoRA - Refactor Punica ops tests (#12970 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-11 07:26:03 +00:00
Florian Greinacher	cb080f32e3	[Bugfix] Support missing tool parameters in mistral tokenizer (#12884 ) Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>	2025-02-11 03:33:33 +00:00
Farzad Abdolhosseini	08b2d845d6	[Model] Ultravox Model: Support v0.5 Release (#12912 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-02-10 22:02:48 +00:00
மனோஜ்குமார் பழனிச்சாமி	2ae889052c	Fix seed parameter behavior in vLLM (#13007 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-10 23:26:50 +08:00

... 7 8 9 10 11 ...

1788 Commits