Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Thomas Parnell
778f554157
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching ( #26222 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-06 10:40:30 +08:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Stan Wozniak
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching ( #25752 )
...
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 06:34:22 +02:00
Thomas Parnell
fea3e476aa
[Kernel] Chunk-aligned mamba2 ( #24683 )
2025-09-29 23:18:25 +02:00
Chih-Chieh Yang
2b6b1d7809
[Model] Mamba2 varlen refactor ( #21467 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
2025-09-26 11:31:14 +00:00
Michael Goin
7361ab379f
Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-23 22:48:40 +00:00
Thomas Parnell
a903669e10
[V1] Remove V0 code paths for Hybrid models ( #25400 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-23 08:26:13 -07:00
Chih-Chieh Yang
73cfb3c5ee
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 ( #24331 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-09-16 14:53:43 +00:00
tomeras91
27fcfe7bcf
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 ( #24593 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 10:51:01 +00:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
tomeras91
08abfa78ec
[Bugfix] fix modelopt exclude_modules name mapping ( #24178 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-10 10:20:46 -07:00
Ayush Satyam
5c4b6e66fe
[Attention] Unify mamba and attention backend selection ( #23171 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
2025-08-25 09:09:36 +00:00
Thomas Parnell
75531a6c13
[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) ( #22928 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-08-15 12:57:06 +00:00
Asaf Joseph Gardin
3d232dbd19
[Mamba] - refactor: Renamed mamba_attn to mamba2_attn ( #22818 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-08-15 06:38:05 +00:00
Chen Zhang
fceafaf582
[Bugfix][mamba] Fix type annotation of Mamba2Metadata ( #22787 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-13 06:07:09 -07:00
Asaf Joseph Gardin
46a13949d5
[v1] - Mamba1 Attention Metadata ( #21249 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-08-06 17:03:42 -07:00
Chih-Chieh Yang
b690e34824
[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead ( #21075 )
...
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-08-02 01:59:34 -07:00
Asaf Joseph Gardin
a6c050286a
[v1][mamba] Added mamba_type into MambaSpec ( #21715 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-07-28 08:15:55 +00:00
Chih-Chieh Yang
eab2f3980c
[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel ( #20839 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
2025-07-25 06:49:36 -07:00
Thomas Parnell
881e3cbe3b
[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers ( #21194 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-19 19:27:21 +00:00
Tuan, Hoang-Trong
f29fd8a7f8
[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 ( #20838 )
...
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
2025-07-15 16:08:26 -04:00
Thomas Parnell
3534c39a20
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli ( #20840 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
nopperl
5d09152ff1
[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine ( #20660 )
...
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
2025-07-11 05:53:31 +00:00
Tuan, Hoang-Trong
47043eb678
[Kernel] Triton implementation of causal-conv1d for Mamba-based models ( #18218 )
...
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-09 12:53:55 -07:00
Chen Zhang
a89209b78d
[v1] Support mamba2 ( #19327 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-18 20:34:15 +00:00
Ning Xie
2f1c19b245
[CI] change spell checker from codespell to typos ( #18711 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-11 19:57:10 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Dhia Eddine Rhaiem
20bd6f4d2e
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) ( #18500 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
2025-05-21 19:23:59 -07:00
Dhia Eddine Rhaiem
eca18691d2
[MODEL] FalconH1 ( #18406 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
2025-05-21 04:59:06 -07:00
omahs
a9944aabfa
fix: typos ( #18151 )
...
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
2025-05-15 02:16:15 -07:00
Harry Mellor
6223dd8114
Update deprecated type hinting in model_executor/layers ( #18056 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:17:23 -07:00
Chih-Chieh Yang
18dd5e01f2
[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels ( #17146 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-05-06 17:59:30 -07:00
Chih-Chieh Yang
daefed052c
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B ( #15423 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
2025-04-10 19:07:07 +00:00
Chih-Chieh Yang
296f927f24
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14857 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-03-20 19:21:08 -07:00
Yu Chin Fabian Lim
06dd08256f
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. ( #14617 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
2025-03-21 00:44:37 +00:00
yury-tokpanov
452e8fd968
[MODEL] Add support for Zamba2 models ( #13185 )
...
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-18 08:56:21 -07:00
Tyler Michael Smith
ccf02fcbae
Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… ( #14848 )
2025-03-14 20:45:42 -07:00
Chih-Chieh Yang
fe66b34728
[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14778 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-03-14 16:36:18 -04:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
Cyrus Leung
7f6bae561c
[CI/Build] Fix pre-commit errors ( #13696 )
2025-02-22 00:31:26 -08:00
Yu Chin Fabian Lim
fca20841c2
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size ( #13660 )
2025-02-22 00:19:10 -08:00
Yu Chin Fabian Lim
aff404571b
Add Bamba Model ( #10909 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-06 15:22:42 -08:00