Lucas Wilkinson
f6bb18fd9a
[BugFix] MLA + V1, illegal memory access and accuracy issues ( #14253 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-05 17:10:13 -08:00
Lu Fang
53ea6ad830
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test ( #14308 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-05 21:41:18 +00:00
Vincent
a4f1ee35d6
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 ( #13997 )
...
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-05 20:22:43 +00:00
Robert Shaw
257e200a25
[V1][Frontend] Add Testing For V1 Runtime Parameters ( #14159 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-03-05 14:18:55 +00:00
Benjamin Chislett
32985bed7c
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-05 06:30:40 +00:00
Michael Goin
dae9ec464c
Temporarily disable test_awq_gemm_opcheck ( #14251 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-05 06:10:35 +00:00
Tyler Michael Smith
72c62eae5f
[V1] EP/TP MoE + DP Attention ( #13931 )
2025-03-04 21:27:26 -08:00
Congcong Chen
0a995d5434
[Model] New model support for Phi-4-multimodal-instruct ( #14119 )
2025-03-04 20:57:01 -08:00
Nick Hill
5db6b2c961
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs ( #13869 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-04 15:06:47 +00:00
Travis Johnson
c060b71408
[Model] Add support for GraniteMoeShared models ( #13313 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-04 08:04:52 +08:00
Mark McLoughlin
ae122b1cbd
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics ( #14055 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 19:04:45 +00:00
TJian
848a6438ae
[ROCm] Faster Custom Paged Attention kernels ( #12348 )
2025-03-03 09:24:45 -08:00
Cody Yu
f35f8e2242
[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 ( #13921 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-03 16:43:14 +08:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Ce Gao
bf33700ecd
[v0][structured output] Support reasoning output ( #12955 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-02 14:49:42 -05:00
Jee Jee Li
cc5e8f6db8
[Model] Add LoRA support for TransformersModel ( #13770 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-02 09:17:34 +08:00
YajieWang
6a92ff93e1
[Misc][Kernel]: Add GPTQAllSpark Quantization ( #12931 )
2025-02-28 22:30:59 -08:00
Luka Govedič
bd56c983d6
[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass ( #10902 )
...
Signed-off-by: luka <luka@neuralmagic.com>
2025-02-28 16:20:11 -07:00
Chen Zhang
28943d36ce
[v1] Move block pool operations to a separate class ( #13973 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-02-28 20:53:31 +00:00
Chen Zhang
e7bd944e08
[v1] Cleanup the BlockTable in InputBatch ( #13977 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-02-28 19:03:16 +00:00
Harry Mellor
4be4b26cb7
Fix entrypoint tests for embedding models ( #14052 )
2025-02-28 08:56:44 -08:00
Cyrus Leung
f7bee5c815
[VLM][Bugfix] Enable specifying prompt target via index ( #14038 )
2025-02-28 07:35:55 -08:00
Harry Mellor
76c89fcadd
Use smaller embedding model when not testing model specifically ( #13891 )
2025-02-28 00:50:43 -08:00
Travis Johnson
73e0225ee9
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama ( #13911 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-02-28 04:00:45 +00:00
Sage Moore
38acae6e97
[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups ( #13970 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-02-27 20:31:47 +00:00
Cyrus Leung
f1579b229d
[VLM] Generalized prompt updates for multi-modal processor ( #13964 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-27 17:44:25 +00:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models ( #13320 )
2025-02-27 02:06:41 -08:00
Michael Goin
788f284b53
Fix test_block_fp8.py test for MoE ( #13915 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-27 18:00:00 +08:00
Mark McLoughlin
cd711c48b2
[V1][Metrics] Handle preemptions ( #13169 )
2025-02-26 20:04:59 -08:00
Rui Qiao
c9944acbf9
[misc] Rename Ray ADAG to Compiled Graph ( #13928 )
2025-02-26 20:03:28 -08:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
Wallas Henrique
4cb6fa0a9c
[Bugfix] Backend option to disable xgrammar any_whitespace ( #12744 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 10:52:34 -08:00
Cyrus Leung
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests ( #13895 )
2025-02-26 04:56:50 -08:00
Joe Runde
3f808cc044
[Bugfix] Do not crash V0 engine on input errors ( #13101 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 19:07:29 +08:00
Florian Greinacher
215bf150a6
[Bugfix] Handle None parameters in Mistral function calls. ( #13786 )
2025-02-26 03:06:21 -08:00
Cyrus Leung
7b700ec8c8
[Bugfix] Add test example for Ultravox v0.5 ( #13890 )
2025-02-26 02:31:43 -08:00
Roger Wang
7ca1da020f
[Misc] Fix input processing for Ultravox ( #13871 )
2025-02-25 23:56:34 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
Harry Mellor
145944cb94
Improve pipeline partitioning ( #13839 )
2025-02-25 18:53:56 -08:00
Lily Liu
5629f26df7
[V1][Spec Decode] Change Spec Decode Rejection Sampling API ( #13729 )
2025-02-25 18:14:48 -08:00
Michael Goin
07c4353057
[Model] Support Grok1 ( #13795 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
Harry Mellor
34e3494e70
Fix failing MyGemma2Embedding test ( #13820 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-25 12:33:03 -08:00
Liangfu Chen
f75aa72732
[Neuron] Add custom_ops for neuron backend ( #13246 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: George Novack <gnovack@amazon.com>
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>
2025-02-25 11:47:49 -08:00
Jee Jee Li
37b6cb4985
[CI/Build] Fix V1 LoRA failure ( #13767 )
2025-02-25 02:01:15 -08:00
Gregory Shtrasberg
aabeb2688f
[ROCm][Quantization][Kernel] Using HIP FP8 header ( #12593 )
2025-02-25 00:39:59 -08:00
Varun Sundar Rabindranath
03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions ( #13705 )
2025-02-25 00:18:02 -08:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-02-24 08:29:41 -08:00
Jongseok Park
781096e385
Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )
2025-02-24 07:33:20 -08:00
Kevin H. Luu
f90a375593
[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>
2025-02-24 06:32:11 +00:00