Jee Jee Li
8e6c7e873f
[Bugfix] Fix MoE BNB version ( #22260 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-05 19:56:22 -07:00
Benji Beck
05fae02175
Migrate KimiVLImagePixelInputs to TensorSchema ( #21769 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-08-05 02:36:18 -07:00
wang.yuqi
586f286789
[Model] Pooling model activation supports per request control by PoolingParams ( #20538 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-05 00:37:00 -07:00
ZiTian.Zhao
4b3e4474d7
Optimize configuration access with LRU cache in custom ops ( #22204 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
2025-08-04 21:43:24 -07:00
Wentao Ye
d7b28f3415
[Log] DeepGEMM Update Log for Unaligned Problem Size ( #22208 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-04 19:13:19 -07:00
Yuxuan Zhang
6fa41e0c32
self.gate dtype update for GLM-4.5 ( #22203 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-08-04 19:12:38 -07:00
TJian
6ad6b8e115
[FEAT] Refactor ROPE into module ( #22192 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-04 19:12:16 -07:00
Po-Han Huang (NVIDIA)
bdcb42e45d
[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading ( #22073 )
2025-08-04 21:02:55 -04:00
Raghav Ravishankar
a5fff3bd49
Fix Arcee model weight loading: Add custom load_weights ( #21725 )
...
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
2025-08-04 04:09:56 -07:00
Weixiao Huang
c1b4eb048a
[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading ( #21164 )
...
Signed-off-by: huangweixiao <huangweixiao@msh.team>
2025-08-04 15:43:06 +08:00
Jee Jee Li
a7b8788d2c
[Misc] Modify the organization of GLM series ( #22171 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-03 23:51:20 -07:00
Chenxi Yang
e5949e5ae0
Remove index_put from MM embeddings merging ( #22105 )
...
Co-authored-by: Chenxi Yang <cxyang@meta.com>
2025-08-03 22:15:14 -07:00
Yuxuan Zhang
d3c18c9cb0
fuse fp32 for GLM-4.5 e_score_correction_bias ( #22143 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-08-03 09:04:54 -07:00
Li, Jiang
b5dfb94fa0
[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation ( #22145 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-03 05:34:04 -07:00
Isotr0py
3dddbf1f25
[Misc] Add tensor schema test coverage for multimodal models ( #21754 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-03 00:52:14 -07:00
jiahanc
337eb23bcc
[Fix] Fix llama4 modelopt weight loading error ( #22107 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-03 00:50:34 -07:00
Yan Ma
73e1b9b1d4
[xpu]support moe models on XPU platform ( #21643 )
...
Signed-off-by: yan <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-08-02 07:49:08 -07:00
Chih-Chieh Yang
b690e34824
[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead ( #21075 )
...
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-08-02 01:59:34 -07:00
Yuxuan Zhang
25373b6c6c
for glm-4.1V update ( #22000 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-08-02 01:46:57 -07:00
Chih-Chieh Yang
c64861d63c
[Bugfix] Mamba2 remove bugged initial state condition in chunk scan ( #22034 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-08-01 23:55:57 -07:00
vllmellm
d3a6f2120b
[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. ( #22069 )
...
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
2025-08-01 23:53:18 -07:00
Dipika Sikka
9f9c38c392
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support ( #21835 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
2025-08-01 19:43:37 -07:00
Varun Sundar Rabindranath
a65f46be5e
[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path ( #21955 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-08-01 19:42:03 -07:00
vllmellm
ee2eb6ecd8
[Model] Qwen2.5 VL SiLU-and-Mul ( #22066 )
...
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
2025-08-01 19:34:37 -07:00
JartX
3654847db5
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve ( #21733 )
2025-08-01 21:12:19 -04:00
Harry Mellor
38c8bce8b6
Enable headless models for pooling in the Transformers backend ( #21767 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath
ac45c44d98
[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch ( #21837 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-08-01 10:14:38 -07:00
Isotr0py
3f8e952179
[Bugfix] Fix glm4.1v video inference issue ( #22067 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-01 09:33:30 -07:00
Dipika Sikka
dfbc1f8880
[Speculative Decoding] Add speculators config support ( #21345 )
2025-08-01 08:25:18 -04:00
Harry Mellor
87c94bc879
Revert "Update sampling_metadata.py ( #21937 )" ( #22088 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-01 05:24:46 -07:00
Jee Jee Li
28b18cc741
[Quantization] Enable BNB support for InternS1 ( #21953 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-01 11:09:54 +00:00
Aviad Rossmann
53d7c39271
Update sampling_metadata.py ( #21937 )
...
Signed-off-by: Aviad Rossmann <aviadr@neureality.ai>
2025-07-31 23:23:18 -07:00
Kyle Sayers
0f46a780d4
[Model] [Quantization] Support quantization for Gemma3n ( #21974 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-31 22:45:15 -07:00
Cyrus Leung
82de9b9d46
[Misc] Automatically resolve HF processor init kwargs ( #22005 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-31 22:44:10 -07:00
Wentao Ye
c3e0e9337e
[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 ( #21639 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-31 15:26:11 -07:00
Benjamin Chislett
2dff2e21d9
[Bugfix] Fix MTP weight loading ( #21941 )
2025-07-31 16:33:53 -04:00
zhiweiz
9e0726e5bf
[Meta] Official Eagle mm support, first enablement on llama4 ( #20788 )
...
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-07-31 10:35:07 -07:00
Song
9484641616
[Model] Add step3 vl ( #21998 )
...
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
2025-07-31 23:19:06 +08:00
amirkl94
207b750e19
[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend ( #21458 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-31 06:00:01 -07:00
wang.yuqi
2836dd73f1
[Model][CI] Let more pooling models support v1 ( #21747 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-31 01:51:15 -07:00
Jee Jee Li
0f7919fca0
[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels ( #21818 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-30 20:41:12 -07:00
Sanchit Gandhi
ec02e536df
[Bugfix] Relax lang pin for voxtral ( #21833 )
...
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 20:38:52 -07:00
Cyrus Leung
004203e953
[CI/Build] Fix registry tests ( #21934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 09:10:41 -07:00
Yong Hoon Shin
ad510309ee
Override attention metadata for fast prefill in some KV sharing setups ( #21590 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-30 08:54:15 -07:00
Isotr0py
6e599eebe8
[Bugfix] Fix OOM tests in initialization test ( #21921 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-30 07:35:47 -07:00
Po-Han Huang (NVIDIA)
ff08e51940
[NVIDIA] Fix Llama4 Scout FP4 functionality issues ( #21499 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-07-30 07:33:40 -07:00
aladerran
d979dd6beb
[Feature][EPLB] Add eplb support for Qwen3 ( #20815 )
...
Signed-off-by: aladerran <aladerran@gmail.com>
2025-07-30 06:27:57 -07:00
Jee Jee Li
fc91da5499
[Model] Remove DSV2 unused code ( #21903 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-30 00:55:03 -07:00
Cyrus Leung
2ca5f82c2a
[Misc] Remove redundant config definitions ( #21891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 23:54:18 -07:00
Areeb Syed
fdde18229e
[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization ( #21808 )
...
Signed-off-by: sydarb <areebsyed237@gmail.com>
2025-07-30 11:35:21 +08:00