vllmellm
e47d55b80f
force kernels for tests
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-07 12:13:40 +00:00
vllmellm
cfb476fe53
minor fixes
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-07 07:48:30 +00:00
vllmellm
56a05cd818
add minimal documentation for torch scaled mm base class
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-07 07:30:57 +00:00
vllmellm
7fb465744c
implement apply func in base FP8ScaledMMLinearKernel class
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-07 07:17:41 +00:00
vllmellm
aaa0d55587
format
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 14:40:30 +00:00
vllmellm
abf597e542
fix quant key selection for ct; remove register_paramter calls; format
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 14:12:14 +00:00
vllmellm
fb72ec8218
add missing kernels for cuda dispatch
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 12:23:38 +00:00
vllmellm
f10171cb3d
correct minimum capability req for channelwise torch
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 12:22:49 +00:00
vllmellm
a76f7bb90c
rename flash_infer.py to flashinfer.py
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 12:13:02 +00:00
vllmellm
f5e6cd9695
prefer QuantKey over ScaledMMLinearQuantStrategy
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 12:11:13 +00:00
vllmellm
a8010c7b1c
flash_infer missing out dtype bug fix
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-03 08:02:45 +00:00
vllmellm
7794009661
add missing arg
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-03 07:09:52 +00:00
vllmellm
b13c4bb25c
remove FP8LinearOps
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 16:30:32 +00:00
vllmellm
52ff537459
update modelopt path
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 16:28:18 +00:00
vllmellm
dd5a70ec71
update unit tests to use ScaledMMLinearKernels
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 16:28:03 +00:00
vllmellm
4ce0ba2df4
format
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 10:01:13 +00:00
vllmellm
8e8218ebac
Merge remote-tracking branch 'origin/main' into refactor-fp8-linear
2025-11-01 09:59:43 +00:00
vllmellm
d92c23b446
fix types; reduce boilerplate for int8
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 09:59:00 +00:00
TJian
e2347dbf58
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration ( #27895 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-01 13:45:23 +08:00
Cyrus Leung
879a06579e
[CI/Build] Bump transformers version ( #27528 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-31 22:11:07 -07:00
Yan Ma
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yejing Lai <yejing.lai@intel.com>
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-01 04:45:02 +00:00
Jee Jee Li
3a5de7d2d6
[Bugfix] Fix KDA output ( #27905 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-01 11:54:36 +08:00
Jee Jee Li
bc4486d609
[Kernel] Enable FusedMoEModularKernel support bias ( #27754 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-01 02:05:12 +00:00
Shu Wang
fc16f1c477
Flashinfer_CUTLASS_MOE fuses quantization for TP ( #27223 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com>
2025-10-31 17:54:29 +00:00
ZiTian Zhao
bc306fe5e9
fix incorrect type annotation in KimiMLP ( #27885 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
2025-10-31 17:38:02 +00:00
Isotr0py
7e06c40e63
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V ( #27860 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-31 17:04:51 +00:00
vllmellm
e845035f4c
bug fix
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 16:38:26 +00:00
vllmellm
5fbe76bc0a
format; update fbgemm path
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 15:08:19 +00:00
vllmellm
1f65cd56e5
revert input scale upper bounds
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 15:06:51 +00:00
vllmellm
7d361487f7
update ptpc path; bug fixes
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 14:52:51 +00:00
vllmellm
dd001064c0
reduce kernel init boilerplate
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 14:21:49 +00:00
vllmellm
423e2a625e
reduce logging boilerplate; update fp8 path
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 14:07:09 +00:00
Jiangyun Zhu
3857eb8725
[Perf] Decouple torch op from GDA to leverage torch.compile ( #27871 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-31 21:35:52 +08:00
toncao
e5ef4dfc11
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants ( #27834 )
...
Signed-off-by: toncao <cpatonn@gmail.com>
Co-authored-by: toncao <cpatonn@gmail.com>
2025-10-31 17:36:37 +08:00
vllmellm
38825fce0f
Merge branch 'main' into refactor-fp8-linear
2025-10-31 15:58:25 +08:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-30 13:11:29 -07:00
Tyler Michael Smith
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-10-30 11:52:18 -07:00
Roger Meier
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com>
2025-10-30 17:36:56 +00:00
Mengqing Cao
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-10-30 17:27:39 +00:00
Varun Sundar Rabindranath
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-30 08:24:31 -07:00
Li, Jiang
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-10-30 15:12:05 +00:00
Fan Yin
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-30 22:34:41 +08:00
vllmellm
c089ea5753
update quark fp8 path; format
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-30 14:24:19 +00:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
2025-10-30 21:02:27 +08:00
vllmellm
c05027f67a
clean up; fix quark path
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-30 12:27:04 +00:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-30 12:13:05 +00:00
vllmellm
e54e572085
fix int8 path
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-30 08:04:24 +00:00
Zhewen Li
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-30 07:54:44 +00:00
Bram Wasti
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
Benjamin Bartels
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: omer-dayan <omdayan@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-29 21:09:10 -07:00