liangel-02
1d642872a2
[torchao] fix safetensors for sharding ( #28169 )
...
Signed-off-by: Angel Li <liangel@meta.com>
2025-11-19 16:39:45 -08:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Alex
f6aa122698
[CI Sprint] Quantization CI Cleanup ( #24130 )
...
Signed-off-by: Alex Yun <alexyun04@gmail.com>
2025-11-18 09:21:48 -05:00
Hank_
4d5943bda6
[quantization][config] enable override existing quant_config ( #28510 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-14 01:24:10 +00:00
Andreas Karatzas
9f0247cfa4
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
2025-11-11 18:34:36 -08:00
xuebwang-amd
05576df85c
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model ( #24239 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-11 12:05:22 -05:00
Vadim Gimpelson
f948ab6945
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests ( #28170 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-06 01:22:13 +00:00
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-04 06:00:57 -05:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
Huamin Li
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-30 12:27:53 +00:00
Xiangyu Li
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-22 09:04:57 +00:00
Varun Sundar Rabindranath
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-21 01:51:14 -04:00
Michael Goin
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-16 17:26:35 -04:00
Michael Goin
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-15 21:02:57 -06:00
Michael Goin
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-14 16:41:43 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Roberto L. Castro
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM ( #24440 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-10 09:43:40 -07:00
Jerry Zhang
a83ff278d6
[torchao] Add support for ModuleFqnToConfig using regex ( #26001 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-10-09 08:32:32 +00:00
elvischenv
5e49c3e777
Bump Flashinfer to v0.4.0 ( #26326 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-10-08 23:58:44 -07:00
liangel-02
b32260ab85
[torchao] safetensors integration ( #25969 )
...
Signed-off-by: Angel Li <liangel@meta.com>
2025-10-07 20:12:35 -06:00
fxmarty-amd
a38c1bfe09
[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py ( #26364 )
...
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
2025-10-07 09:52:24 -07:00
fxmarty-amd
41f1cf38f2
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 ( #21166 )
2025-10-07 09:35:26 -04:00
Michael Goin
20db99cc69
[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe ( #26188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 13:50:11 -04:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
ElizaWszola
502640c3f9
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class ( #25696 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-10-02 19:35:13 +00:00
Michael Goin
3b279a84be
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests ( #26040 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-02 09:07:19 -07:00
Jerry Zhang
c31246800c
Support RL online quantization with torchao ( #23014 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-10-01 16:39:29 -07:00
Michael Goin
f708bd4904
[CI] Add E2E Blackwell Quantized MoE Test ( #25723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-26 12:23:00 -07:00
Tyler Michael Smith
1260180c67
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-25 08:05:21 +00:00
ElizaWszola
63400259d0
[Performance] Move apply_w8a8_block_fp8_linear to an op class ( #24666 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-23 12:03:10 -07:00
Woosuk Kwon
72dd1595b4
[CI] Skip tests failing on main ( #25326 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:57:46 -07:00
Cyrus Leung
3d9a1d2de5
[V1] Support LLM.apply_model ( #18465 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 07:14:35 +00:00
haoyangli-amd
ca2d1925ef
[Rocm] [quantization] Fix quark ptpc moe and add test case ( #24649 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Co-authored-by: Haoyang Li <haoyang.li@amd.com>
2025-09-16 22:15:13 -07:00
Jerry Zhang
2048c4e379
[torchao] Support quantization configs using module swap ( #21982 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-09-10 23:53:24 -07:00
co63oc
1bd007f234
fix some typos ( #24071 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
Kyle Sayers
22feac8e95
[Transform] [Quantization] Add transforms to compressed tensors ( #22486 )
2025-08-28 02:43:48 -04:00
czhu-cohere
2c2b140ae8
[quantization] use channel scales for w4a8 + misc fixes ( #23570 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-08-26 18:23:23 -07:00
Cyrus Leung
8896eb72eb
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed ( #18800 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-22 10:56:57 +08:00
Michael Goin
0cdbf5e61c
[Kernel/Quant] Remove the original marlin format and qqq ( #23204 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 15:13:36 -04:00
Harry Mellor
796bae07c5
Update transformers to v4.55 ( #21931 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 22:56:14 -07:00
Wentao Ye
4771df7b2b
[Feature] Non-contiguous Support for FP8 Quantization ( #21961 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-05 02:36:43 -07:00
Shinichi Hemmi
c7ffe93d9c
[Model] Support TP/PP/mamba2 kernel for PLaMo2 ( #19674 )
...
Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp>
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
Co-authored-by: Sixue Wang <cecilwang@preferred.jp>
2025-07-28 05:00:47 +00:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Wentao Ye
bda9d0535f
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor ( #21631 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-27 05:25:21 -07:00
Alex Kogan
7ae75fa6d0
[Feature] Add support for MoE models in the calibration-free RTN-based quantization ( #20766 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
2025-07-25 18:09:34 -07:00
Zhiyu
6b46c4b653
Add Nvidia ModelOpt config adaptation ( #19815 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-07-21 10:02:58 -04:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance ( #20996 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Isotr0py
77f77a951e
[Misc] Clean up mark to fork process in BNB tests ( #20692 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-10 13:59:40 +00:00
fxmarty-amd
332d4cb17b
[Feature][Quantization] MXFP4 support for MOE models ( #17888 )
...
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
2025-07-09 13:19:02 -07:00