31 Commits

Author SHA1 Message Date
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307)
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>
2025-11-19 03:12:31 +00:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support (#27772)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:56:29 -08:00
xuebwang-amd
5a1271d83a
[Quantization] fix attention quantization of gpt_oss model (#27334)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
2025-11-11 12:06:00 -05:00
Zhewen Li
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-29 06:27:30 +00:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils (#27207)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 10:44:21 -04:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function (#26982)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-22 09:04:57 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Woosuk Kwon
72dd1595b4
[CI] Skip tests failing on main (#25326)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:57:46 -07:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Cyrus Leung
bef180f009
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 17:50:58 +00:00
Cyrus Leung
3d9a1d2de5
[V1] Support LLM.apply_model (#18465)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 07:14:35 +00:00
Woosuk Kwon
14006840ea
[V0 Deprecation] Remove V0 FlashInfer attention backend (#22776)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 19:54:16 -07:00
Michael Goin
4fc722eca4
[Kernel/Quant] Remove AQLM (#22943)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-16 19:38:21 +00:00
Roger Wang
067c34a155
docs: remove deprecated disable-log-requests flag (#22113)
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-08-02 00:19:48 -07:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Jee Jee Li
8020e98c9f
[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-11 08:01:13 +00:00
Isotr0py
77f77a951e
[Misc] Clean up mark to fork process in BNB tests (#20692)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-10 13:59:40 +00:00
Cyrus Leung
9fb52e523a
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-06 09:54:36 -07:00
Michael Goin
1e473b3010
[CI] Disable failing GGUF model test (#19454)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-11 05:12:38 +00:00
Isotr0py
d2f0e7e615
[CI/Build] Improve Llama GGUF test robustness (#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-07 17:23:28 +08:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Isotr0py
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-27 04:40:28 +00:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
hissu-hyvarinen
f6518b2b48
[ROCm] Skip tests for quantizations incompatible with ROCm (#17905)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
2025-05-12 18:39:28 -06:00
Bowen Bao
db593aa67f
[Quantization] Quark MXFP4 format loading (#16943) 2025-05-07 15:05:05 -04:00
Cyrus Leung
d7543862bd
[Misc] Rename assets for testing (#17575)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-02 03:29:25 -07:00
Cyrus Leung
48e925fab5
[Misc] Clean up test docstrings and names (#17521)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-01 05:19:32 -07:00
Cyrus Leung
afb4429b4f
[CI/Build] Reorganize models tests (#17459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-30 23:03:08 -07:00