xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-26 11:37:06 +08:00

Author	SHA1	Message	Date
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Lu Fang	53ea6ad830	[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 21:41:18 +00:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Benjamin Chislett	32985bed7c	[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-05 06:30:40 +00:00
Michael Goin	dae9ec464c	Temporarily disable test_awq_gemm_opcheck (#14251 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 06:10:35 +00:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
Travis Johnson	c060b71408	[Model] Add support for GraniteMoeShared models (#13313 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-04 08:04:52 +08:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Cody Yu	f35f8e2242	[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-03 16:43:14 +08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
Jee Jee Li	cc5e8f6db8	[Model] Add LoRA support for TransformersModel (#13770 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-02 09:17:34 +08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
Harry Mellor	4be4b26cb7	Fix entrypoint tests for embedding models (#14052 )	2025-02-28 08:56:44 -08:00
Cyrus Leung	f7bee5c815	[VLM][Bugfix] Enable specifying prompt target via index (#14038 )	2025-02-28 07:35:55 -08:00
Harry Mellor	76c89fcadd	Use smaller embedding model when not testing model specifically (#13891 )	2025-02-28 00:50:43 -08:00
Travis Johnson	73e0225ee9	[Bugfix] Check that number of images matches number of <\|image\|> tokens with mllama (#13911 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-02-28 04:00:45 +00:00
Sage Moore	38acae6e97	[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-27 20:31:47 +00:00
Cyrus Leung	f1579b229d	[VLM] Generalized prompt updates for multi-modal processor (#13964 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 17:44:25 +00:00
Isotr0py	edf309ebbe	[VLM] Support multimodal inputs for Florence-2 models (#13320 )	2025-02-27 02:06:41 -08:00
Michael Goin	788f284b53	Fix test_block_fp8.py test for MoE (#13915 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-27 18:00:00 +08:00
Mark McLoughlin	cd711c48b2	[V1][Metrics] Handle preemptions (#13169 )	2025-02-26 20:04:59 -08:00
Rui Qiao	c9944acbf9	[misc] Rename Ray ADAG to Compiled Graph (#13928 )	2025-02-26 20:03:28 -08:00
Lucas Wilkinson	f95903909f	[Kernel] FlashMLA integration (#13747 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-02-27 10:35:08 +08:00
Wallas Henrique	4cb6fa0a9c	[Bugfix] Backend option to disable xgrammar any_whitespace (#12744 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 10:52:34 -08:00
Cyrus Leung	934bb99c71	[Bugfix] Update expected token counts for Ultravox tests (#13895 )	2025-02-26 04:56:50 -08:00
Joe Runde	3f808cc044	[Bugfix] Do not crash V0 engine on input errors (#13101 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 19:07:29 +08:00
Florian Greinacher	215bf150a6	[Bugfix] Handle None parameters in Mistral function calls. (#13786 )	2025-02-26 03:06:21 -08:00
Cyrus Leung	7b700ec8c8	[Bugfix] Add test example for Ultravox v0.5 (#13890 )	2025-02-26 02:31:43 -08:00
Roger Wang	7ca1da020f	[Misc] Fix input processing for Ultravox (#13871 )	2025-02-25 23:56:34 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Harry Mellor	145944cb94	Improve pipeline partitioning (#13839 )	2025-02-25 18:53:56 -08:00
Lily Liu	5629f26df7	[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729 )	2025-02-25 18:14:48 -08:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Harry Mellor	34e3494e70	Fix failing `MyGemma2Embedding` test (#13820 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-25 12:33:03 -08:00
Liangfu Chen	f75aa72732	[Neuron] Add custom_ops for neuron backend (#13246 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: George Novack <gnovack@amazon.com> Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>	2025-02-25 11:47:49 -08:00
Jee Jee Li	37b6cb4985	[CI/Build] Fix V1 LoRA failure (#13767 )	2025-02-25 02:01:15 -08:00
Gregory Shtrasberg	aabeb2688f	[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593 )	2025-02-25 00:39:59 -08:00
Varun Sundar Rabindranath	03f48b3db6	[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705 )	2025-02-25 00:18:02 -08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Kevin H. Luu	f90a375593	[ci] Add logic to change model to S3 path only when S3 CI env var is on (#13727 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>	2025-02-24 06:32:11 +00:00

1 2 3 4 5 ...

1467 Commits