Huy Do
|
67c14906aa
|
Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-29 18:57:35 +08:00 |
|
Flora Feng
|
69f46359dd
|
[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-08-29 18:36:57 +08:00 |
|
wang.yuqi
|
d9e00dbd1f
|
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-29 03:12:32 -07:00 |
|
Li, Jiang
|
ad39106b16
|
[CPU] Enable data parallel for CPU backend (#23903)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-29 02:19:58 -07:00 |
|
Maximilien de Bayser
|
2554b27baa
|
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-29 00:04:02 -07:00 |
|
Harry Mellor
|
934bebf192
|
Better errors for Transformers backend missing features (#23759)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-29 07:01:40 +00:00 |
|
Jiangyun Zhu
|
885ca6d31d
|
[Misc] Fix warnings for mistral model (#23552)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-08-29 06:58:48 +00:00 |
|
Chenheli Hua
|
2d0afcc9dc
|
[mrope][Qwen2-VL] Fix edge case where getting index of image/video token can potentially throw in default vl mrope implementation. (#23895)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-28 23:29:13 -07:00 |
|
Jee Jee Li
|
b4f9e9631c
|
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-28 23:28:35 -07:00 |
|
Raghavan
|
05d839c19e
|
Fix(async): Add support for truncate_prompt_tokens in AsyncLLM (#23800)
|
2025-08-28 22:55:06 -07:00 |
|
wangxiyuan
|
6597d7a456
|
[Platform] import activation_quant_fusion for CUDA only (#23882)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-08-28 22:54:16 -07:00 |
|
Jinghui Zhang
|
5264015d74
|
[BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek running on AMD (#23864)
Signed-off-by: Jinghui Zhang <jinghuizhang0804@gmail.com>
|
2025-08-28 22:54:12 -07:00 |
|
Isotr0py
|
98ac0cb32d
|
[Bugfix] Use ReplicatedLinear for SequenceClassification head (#23836)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-29 04:41:20 +00:00 |
|
Russell Bryant
|
c8b3b299c9
|
[tests] Improve speed and reliability of test_transcription_api_correctness (#23854)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-29 04:25:33 +00:00 |
|
Charlie Fu
|
006477e60b
|
[ROCm][Fix] Fix rocm build caused by #23791 (#23847)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-08-28 19:52:27 -07:00 |
|
Lukas Geiger
|
de533ab2a1
|
[Models] Improve iteration over layers (#19497)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-08-29 09:26:34 +08:00 |
|
Chaojun Zhang
|
235c9db8a7
|
[XPU] support data parallel for MoE models on XPU (#22887)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-08-29 09:23:04 +08:00 |
|
Woosuk Kwon
|
b668055a11
|
[V0 Deprecation] Remove V0 Samplers test (#23862)
|
2025-08-28 18:05:52 -07:00 |
|
Wentao Ye
|
d3d2aad5a2
|
[Log] Use Debug Once for DeepGEMM E8M0 When not Enabled (#23858)
|
2025-08-28 22:18:10 +00:00 |
|
Yong Hoon Shin
|
cb293f6a79
|
[V1] Enable prefill optimization for Gemma3n (#22628)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-28 14:54:30 -07:00 |
|
Woosuk Kwon
|
7ffbf27239
|
[BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu (#23737)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-28 14:22:46 -07:00 |
|
Simon Mo
|
27e88cee74
|
chore: build release image by default (#23852)
Signed-off-by: Codex <codex@openai.com>
|
2025-08-28 13:17:15 -07:00 |
|
elvischenv
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
Jingkai He
|
57d4ede520
|
[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) (#23829)
Signed-off-by: He-Jingkai <he-jingkai@outlook.com>
|
2025-08-28 19:05:20 +00:00 |
|
Divakar Verma
|
04d1dd7f4a
|
[ROCm][Aiter] Add triton fp8 bmm kernel for mla (#23264)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
|
2025-08-28 18:18:08 +00:00 |
|
Benji Beck
|
f32a5bc505
|
Migrate Llama4ImagePatchInputs to TensorSchema (#22021)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-28 17:29:37 +00:00 |
|
Jean Schmidt
|
8805ad9fa9
|
Add scale_config.yml file for Meta autoscalers for GH Actions (#23840)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
|
2025-08-28 09:31:20 -07:00 |
|
Jean Schmidt
|
0583578f42
|
[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
|
2025-08-28 08:59:19 -07:00 |
|
Angela Yi
|
db74d60490
|
[Bugfix] Add fake mode around passes (#23349)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-08-28 11:25:56 -04:00 |
|
Po-Han Huang (NVIDIA)
|
95089607fa
|
[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-28 06:56:20 -07:00 |
|
Thomas Parnell
|
1f096f9b95
|
[CI] Fix linting error on main (#23835)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-28 06:52:01 -07:00 |
|
YUQI.CHENG
|
66548f6603
|
[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823)
Signed-off-by: crischeng <420985011@qq.com>
Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>
|
2025-08-28 21:44:09 +08:00 |
|
Didier Durand
|
d3da2eea54
|
[Doc]: fix typos in Python scripts (#23828)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-28 05:37:38 -07:00 |
|
Jiangyun Zhu
|
bfab219648
|
[Model] [gpt-oss] fix gpt-oss pp support (#23815)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-28 05:36:55 -07:00 |
|
Woosuk Kwon
|
a3432f18fd
|
[BugFix][Spec Decode] Use float64 for uniform_probs (#23803)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-28 12:26:45 +00:00 |
|
Li, Jiang
|
67cee40da0
|
[CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-28 11:57:05 +00:00 |
|
Didier Durand
|
d99c3a4f7b
|
[Doc]: fix typos in .md files (including those of #23751) (#23825)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-28 04:38:19 -07:00 |
|
JartX
|
3462c1c522
|
[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-28 09:03:22 +00:00 |
|
Isotr0py
|
c5d004aaaf
|
[Model] Add PP support and VLM backbone compatability for GPT-OSS (#23680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-28 16:03:28 +08:00 |
|
wang.yuqi
|
11a7fafaa8
|
[New Model]: Support GteNewModelForSequenceClassification (#23524)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-28 15:36:42 +08:00 |
|
yzds
|
186aced5ff
|
[Kernel] cuda kernels for upcoming decode context parallel feature (#23791)
Co-authored-by: hongchao <hongchao@msh.team>
|
2025-08-28 15:29:11 +08:00 |
|
rongfu.leng
|
daa1273b14
|
[Bugfix] when set offline model running error (#23711)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-28 07:27:45 +00:00 |
|
Jiangyun Zhu
|
c07a73317d
|
[CI] enable idefics3 and fuyu-8b test in multimodal test (#23790)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-28 14:51:24 +08:00 |
|
Kyle Sayers
|
22feac8e95
|
[Transform] [Quantization] Add transforms to compressed tensors (#22486)
|
2025-08-28 02:43:48 -04:00 |
|
Jinheng
|
c8851a4723
|
Add deprecation warning for lora_extra_vocab_size (#23635)
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
|
2025-08-27 22:34:29 -07:00 |
|
Alex
|
f48a9af892
|
[CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-08-27 21:27:36 -07:00 |
|
Jan Kessler
|
a11adafdca
|
Gracefully handle edge cases in harmony utils (#23155)
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-27 20:14:00 -07:00 |
|
Michael Goin
|
a781e84ec2
|
[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-28 11:12:53 +08:00 |
|
Shrey Gupta
|
1b7b161a09
|
[Feature] models: pass layer prefix to replace_linear_class for per-layer quantization routing. Addresses #23239 (#23556)
Signed-off-by: Shrey Gupta <shreyg1303@gmail.com>
|
2025-08-27 20:12:44 -07:00 |
|
Benji Beck
|
a69693e38f
|
Migrate Qwen inputs to TensorSchema (#23473)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-28 10:43:26 +08:00 |
|