vllmellm
|
fb72ec8218
|
add missing kernels for cuda dispatch
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-04 12:23:38 +00:00 |
|
vllmellm
|
f10171cb3d
|
correct minimum capability req for channelwise torch
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-04 12:22:49 +00:00 |
|
vllmellm
|
a76f7bb90c
|
rename flash_infer.py to flashinfer.py
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-04 12:13:02 +00:00 |
|
vllmellm
|
f5e6cd9695
|
prefer QuantKey over ScaledMMLinearQuantStrategy
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-04 12:11:13 +00:00 |
|
vllmellm
|
a8010c7b1c
|
flash_infer missing out dtype bug fix
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-03 08:02:45 +00:00 |
|
vllmellm
|
7794009661
|
add missing arg
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-03 07:09:52 +00:00 |
|
vllmellm
|
b13c4bb25c
|
remove FP8LinearOps
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-01 16:30:32 +00:00 |
|
vllmellm
|
52ff537459
|
update modelopt path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-01 16:28:18 +00:00 |
|
vllmellm
|
dd5a70ec71
|
update unit tests to use ScaledMMLinearKernels
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-01 16:28:03 +00:00 |
|
vllmellm
|
4ce0ba2df4
|
format
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-01 10:01:13 +00:00 |
|
vllmellm
|
8e8218ebac
|
Merge remote-tracking branch 'origin/main' into refactor-fp8-linear
|
2025-11-01 09:59:43 +00:00 |
|
vllmellm
|
d92c23b446
|
fix types; reduce boilerplate for int8
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-01 09:59:00 +00:00 |
|
Jee Jee Li
|
3a5de7d2d6
|
[Bugfix] Fix KDA output (#27905)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-01 11:54:36 +08:00 |
|
Jee Jee Li
|
bc4486d609
|
[Kernel] Enable FusedMoEModularKernel support bias (#27754)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-01 02:05:12 +00:00 |
|
Shu Wang
|
fc16f1c477
|
Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
|
2025-10-31 17:54:29 +00:00 |
|
vllmellm
|
e845035f4c
|
bug fix
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 16:38:26 +00:00 |
|
vllmellm
|
5fbe76bc0a
|
format; update fbgemm path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 15:08:19 +00:00 |
|
vllmellm
|
1f65cd56e5
|
revert input scale upper bounds
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 15:06:51 +00:00 |
|
vllmellm
|
7d361487f7
|
update ptpc path; bug fixes
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 14:52:51 +00:00 |
|
vllmellm
|
dd001064c0
|
reduce kernel init boilerplate
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 14:21:49 +00:00 |
|
vllmellm
|
423e2a625e
|
reduce logging boilerplate; update fp8 path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-31 14:07:09 +00:00 |
|
Jiangyun Zhu
|
3857eb8725
|
[Perf] Decouple torch op from GDA to leverage torch.compile (#27871)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-31 21:35:52 +08:00 |
|
vllmellm
|
38825fce0f
|
Merge branch 'main' into refactor-fp8-linear
|
2025-10-31 15:58:25 +08:00 |
|
Paul Zhang
|
e7acb20076
|
[Feature] Batch invariant torch.compile (#27660)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-30 13:11:29 -07:00 |
|
Roger Meier
|
2918c1b49c
|
[Model] Use the same fused_moe configs for all H200 devices (#23642)
Signed-off-by: Roger Meier <r.meier@siemens.com>
|
2025-10-30 17:36:56 +00:00 |
|
Li, Jiang
|
eebf00cb0c
|
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend (#27800)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-30 15:12:05 +00:00 |
|
vllmellm
|
c089ea5753
|
update quark fp8 path; format
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-30 14:24:19 +00:00 |
|
Zhiyuan Li
|
4e68cc9b6a
|
[Model] Introduce Kimi Linear to vLLM (#27809)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-30 21:02:27 +08:00 |
|
vllmellm
|
c05027f67a
|
clean up; fix quark path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-30 12:27:04 +00:00 |
|
wang.yuqi
|
4464723f22
|
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-30 12:13:05 +00:00 |
|
vllmellm
|
e54e572085
|
fix int8 path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-30 08:04:24 +00:00 |
|
Bram Wasti
|
ded8ada86a
|
Add more dims for batch invariant shims (#27489)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-30 05:28:45 +00:00 |
|
Yan Ma
|
b798e39f93
|
[XPU][bugfix] fix rope for llama4 and deepseek (#25145)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-10-30 09:43:13 +08:00 |
|
Wentao Ye
|
b5d90f7400
|
[Bug] Fix DBO IMA issue for DeepEPHT (#27666)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-29 16:28:27 -04:00 |
|
Wentao Ye
|
fcb1d570bb
|
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug (#27682)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-29 14:50:39 -04:00 |
|
Roger Young
|
d6704dd099
|
Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-10-29 21:01:05 +08:00 |
|
Zhewen Li
|
8b62495076
|
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl (#27605)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-29 00:00:15 -07:00 |
|
Wentao Ye
|
6afc28a9ba
|
[Test] Batch Invariant: Unit test using parameterized backend (#27478)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-28 13:51:35 -07:00 |
|
vllmellm
|
974e6820ce
|
first try
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-28 16:26:51 +00:00 |
|
Zhiyuan Li
|
e88bdd60d9
|
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (#27654)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
|
2025-10-28 22:56:28 +08:00 |
|
Wentao Ye
|
0484b64248
|
[Bug] Fix shape issue for eplb expert weights (#27589)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-28 20:44:05 +08:00 |
|
Matthew Bonanni
|
44b5ce956d
|
[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-28 12:00:56 +00:00 |
|
Li, Jiang
|
d34f5fe939
|
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms (#27526)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-27 23:25:44 -07:00 |
|
Eric Yue
|
bdb01a38fe
|
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323)
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
|
2025-10-27 22:58:06 -07:00 |
|
Varun Sundar Rabindranath
|
5d3be3ba4c
|
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-27 07:32:50 -07:00 |
|
Cyrus Leung
|
7c2bdb83dc
|
[Misc] Clean up utils (#27552)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-27 09:05:40 +00:00 |
|
Danielle Robinson
|
9932ed6a83
|
[Kernel] Adding split_K implementation for fused_moe_lora (#27291)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-27 02:05:24 -07:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Varun Sundar Rabindranath
|
269c4db0a4
|
[Misc][DP] Guard mxfp4 implementation selection (#27484)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-24 23:29:24 +00:00 |
|
Wentao Ye
|
52efc34ebf
|
[Log] Optimize Startup Log (#26740)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-24 19:27:04 -04:00 |
|