Cyrus Leung
9452863088
Revert "Revert #28875 ( #29159 )" ( #29179 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 04:27:43 -08:00
Bhagyashri
2b1b3dfa4b
Update Dockerfile to use gcc-toolset-14 and fix test case failures on power (ppc64le) ( #28957 )
...
Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
2025-11-21 12:24:09 +00:00
Russell Bryant
cca2d2cdbe
[Core] Align whisper closer to other multimodal models ( #27292 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-11-21 12:01:54 +00:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references ( #29088 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
WeiQing Chen
b34129bf8e
[Misc] remove useless v1 env ( #29164 )
...
Signed-off-by: David Chen <530634352@qq.com>
2025-11-21 01:41:20 -08:00
Cyrus Leung
4d7231e774
Revert #28875 ( #29159 )
2025-11-21 01:40:17 -08:00
Huamin Li
8ac3a41487
[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers ( #29111 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 23:53:30 -08:00
Canlin Guo
7d6da483b0
[Minor][Clean] Remove the legacy assertion in video ( #29150 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-20 23:52:34 -08:00
Chenheli Hua
e4c3182c68
[Small] Capture AttributeError when checking ray dependency. ( #29024 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-11-20 22:54:10 -08:00
Alex Brooks
b4734b9550
[Bugfix] Fix default MM LoRA alignment for single str prompts ( #29140 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-11-21 13:32:30 +08:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 ( #28771 )" ( #29121 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-20 21:27:45 -08:00
Matthew Bonanni
11857a00b0
[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry ( #29103 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-20 20:24:43 -08:00
Boyuan Feng
8c25f9cfb6
[BugFix] skip combo kernel on cpu ( #29129 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-21 11:50:59 +08:00
Cyrus Leung
56e96b37e4
[V0 Deprecation] Remove best_of ( #29090 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:40:40 +08:00
Qidong Su
698024ecce
[Doc] update installation guide regarding aarch64+cuda pytorch build ( #28875 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-20 19:40:25 -08:00
jeremyteboul
0730414999
[Core] Add audio_embeds support to chat completions ( #29059 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-11-21 11:39:47 +08:00
zhrrr
a982f5b5ea
[kernel][perf] support uncontiguous input for rms_norm kernel ( #28103 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-20 19:39:09 -08:00
Cyrus Leung
0e741c12e3
[Bugfix] Fix Plamo3 rope handling ( #29092 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 11:38:35 +08:00
Wentao Ye
56669c1f29
[CI] Fix mypy for vllm/v1/worker ( #29037 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 11:36:07 +08:00
Hongxia Yang
3f5f36da3f
[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving ( #29127 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
2025-11-21 03:30:07 +00:00
Wentao Ye
e1eefa4c40
[Bug] Fix torch warning of tf32 usage ( #29112 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 01:54:59 +00:00
Xiao Li
ed6ae1e36a
[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation ( #29124 )
...
Signed-off-by: Xiao Li <ilx@meta.com>
2025-11-20 17:54:35 -08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-21 09:46:43 +08:00
Wentao Ye
df44df0143
[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement ( #28879 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 18:41:49 -07:00
Michael Goin
87cbbdff63
Update model references for OLMo3 ( #29099 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-21 09:16:52 +08:00
Michael Goin
986ab5db63
[CI Bugfix] Fix Kernels DeepGEMM Test (H100) ( #29106 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-20 16:42:33 -08:00
Rob Mulla
dd39f91edb
[Doc] cleanup TPU documentation and remove outdated examples ( #29048 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 00:05:59 +00:00
rasmith
c7a29d2c8d
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py ( #29022 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:44:37 +00:00
rasmith
8237ab8a2b
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now ( #29021 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:35:14 +00:00
Driss Guessous
3fd74189db
Fixes bench ( #29058 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-11-20 21:21:54 +00:00
rasmith
5e5a7eb16f
[CI/Build] Make test_attention_selector.py run tests on correct platform ( #29064 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-20 20:45:56 +00:00
rasmith
3d84ef9054
[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py ( #29043 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 20:39:49 +00:00
Software Developer
4d01b64284
[Bugfix] - Add Trace Headers to Beam Search Path ( #29100 )
...
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
2025-11-20 20:00:33 +00:00
Kevin H. Luu
114b0e2500
[chore] Update annotate release scripts ( #29077 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-20 10:22:40 -08:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
Pan Li
e5bfcb6a88
[BugFix][PD]: make example proxy usable with P2pNcclConnector ( #26628 )
...
Signed-off-by: PAN <1162953505@qq.com>
2025-11-20 17:38:31 +00:00
Alexei-V-Ivanov-AMD
22924383e1
Updating the mirror of test-amd.yaml as of 2025-11-18 ( #29016 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-11-20 12:07:06 -05:00
rookie
56f45eddaf
[Frontend] Optimize beam search loop by sorting and then splicing ( #19347 )
...
Signed-off-by: zhangguozhu <zhangguozhu@360.cn>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: zhangguozhu <zhangguozhu@360.cn>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-20 09:02:30 -08:00
TJian
82b05b15e6
[BugFix] [FEAT] Enable fastsafetensors for ROCm platform ( #28225 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-20 16:34:11 +00:00
Fanli Lin
a2e9ebe9e2
[BugFix] Fix flash_attn import in siglip2navit.py ( #29082 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-11-20 12:14:29 +00:00
Zhewen Li
93c8672ceb
[Bugfix] Fix spec decode memory regression after #28549 ( #28819 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-20 19:05:50 +08:00
Samit
371b1d4c61
[RL] Add Pause and Resume Generation for Asynchronous RL Training ( #28037 )
...
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-20 03:01:03 -08:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 ( #28834 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 18:55:10 +08:00
Pleaplusone
06c20c9904
[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA ( #26670 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 02:54:01 -08:00
Anna Shors
6eb745d9bd
Add truncate arg to yarn to match openai implementation of gpt-oss ( #28244 )
...
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-11-20 18:53:50 +08:00
cjackal
66483a9d00
[Chore] Update xgrammar version from 0.1.25 to 0.1.27 ( #28221 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-11-20 02:53:09 -08:00
Jinzhen Lin
edfe867208
[Misc] don't cache CUTLASS_REVISION var in CMakeLists.txt ( #28518 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-11-20 02:52:53 -08:00
Dezhan
dc45efc8ef
[BugFix] Fix Llama4 Pipeline Parallelism Assert Error ( #28577 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com>
2025-11-20 02:52:36 -08:00
Vensen
fb8851f254
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu ( #28760 )
...
Signed-off-by: vensen <vensenmu@gmail.com>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
2025-11-20 02:52:02 -08:00