7993 Commits

Author SHA1 Message Date
Chengji Yao
f1b286b2fb
[TPU] Update ptxla nightly version to 20250724 (#21555)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-25 17:09:00 -07:00
Rui Qiao
c7742d6113
[Bugfix] Always set RAY_ADDRESS for Ray actor before spawn (#21540)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-25 17:08:30 -07:00
Rui Qiao
cea96a0156
[Bugfix] Fix sync_and_slice_intermediate_tensors (#21537)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-25 17:07:58 -07:00
Yong Hoon Shin
2eddd437ba
Add interleaved RoPE test for Llama4 (Maverick) (#21478)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-25 17:07:26 -07:00
Wentao Ye
75d29cf4e1
[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-25 17:07:07 -07:00
Daniel Han
41d3082c41
Add Unsloth to RLHF.md (#21636) 2025-07-25 17:06:48 -07:00
QiliangCui
7cfea0df39
[TPU][Test] Rollback PR-21550. (#21619)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-25 13:22:01 -07:00
Wenhua Cheng
5ac3168ee3
[Docs] add auto-round quantization readme (#21600)
Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-25 08:52:42 -07:00
Kebe
396ee94180
[CI] Unifying Dockerfiles for ARM and X86 Builds (#21343)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-25 07:33:56 -07:00
mgazz
e189b50f53
Add support for Prithvi in Online serving mode (#21518)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-25 07:01:27 -07:00
czhu-cohere
136d750f5f
[Kernel] Improve machete memory bound perf (#21556)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-07-25 06:53:21 -07:00
who who who
b3caeb82e7
[ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. (#20295)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: amd-ruitang3 <Rui.Tang2@amd.com>
Co-authored-by: amd-ruitang3 <Rui.Tang2@amd.com>
2025-07-25 06:50:21 -07:00
Chih-Chieh Yang
eab2f3980c
[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel (#20839)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>
2025-07-25 06:49:36 -07:00
kourosh hakhamaneshi
9fe98d4250
[Frontend] Add request_id to the Request object so they can be controlled better via external load balancers (#21009)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-25 06:49:11 -07:00
bigshanedogg
29c6fbe58c
[MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B (#20931)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
2025-07-25 06:05:42 -07:00
xyxinyang
c72f049cb4
[Model] Fix Ernie4.5MoE e_score_correction_bias parameter (#21586)
Signed-off-by: zhouchong <zhouchong03@baidu.com>
Co-authored-by: zhouchong <zhouchong03@baidu.com>
2025-07-25 06:02:53 -07:00
Mengqing Cao
f3a683b7c9
[Bugfix][Logprobs] Fix logprobs op to support more backend (#21591)
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-25 05:53:07 -07:00
Cyrus Leung
46d81d6951
[V1] Get supported tasks from model runner instead of model config (#21585)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-25 05:36:45 -07:00
Jee Jee Li
5c3f2628d5
[Quantization] Enable BNB support for more MoE models (#21370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-25 03:57:34 -07:00
Kebe
7311f74468
[Bugfix] GGUF: fix AttributeError: 'PosixPath' object has no attribute 'startswith' (#21579)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-25 03:42:23 -07:00
Xu Wenqing
8ed01e32f7
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct (#21598)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-07-25 02:36:55 -07:00
Nick Hill
e38e96a3c0
[Tests] Harden DP tests (#21508)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-25 02:27:24 -07:00
Chengji Yao
40d86ee412
[TPU][Bugfix] fix OOM issue in CI test (#21550)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-24 23:01:53 -07:00
Yang Chen
85d051f026
[Misc] Removed undefined cmake variables MOE_PERMUTE_ARCHS (#21262)
Signed-off-by: Yang Chen <yangche@fb.com>
2025-07-24 22:54:23 -07:00
Ignacio Sica
5140f54b89
[CI/Build] fix cpu_extension for apple silicon (#21195)
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
2025-07-24 22:53:59 -07:00
Chengji Yao
947edd099e
[Misc][Tools] make max-model-len a parameter in auto_tune script (#21321)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-24 22:46:43 -07:00
hfan
fde60ee775
[Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings (#21479)
Signed-off-by: Hongmin Fan <fanhongmin@google.com>
2025-07-25 13:46:06 +08:00
Jason Gu
b38bc652ac
[Model] Support tensor parallel for timm ViT in Deepseek_vl2 (#21494)
Signed-off-by: wzqd <1057337859@qq.com>
2025-07-24 22:45:16 -07:00
Ning Xie
adaf2c6d4f
[Bugfix] fix modelscope snapshot_download serialization (#21536)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-24 22:44:38 -07:00
Li, Jiang
42343f1f89
[CI] Update CODEOWNERS for CPU and Intel GPU (#21582)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-24 21:58:03 -07:00
Benji Beck
965bc71b04
Integrate TensorSchema with shape validation for Phi3VImagePixelInputs (#21232)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-24 21:43:52 -07:00
Zhou Fang
807a328bb6
[Docs] Add requirements/common.txt to run unit tests (#21572)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
2025-07-24 20:51:15 -07:00
QiliangCui
e0be2c4d09
[TPU][Test] Temporarily suspend this MoE model in test_basic.py. (#21560)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-24 20:44:50 -07:00
Nick Hill
9c8b2c2a8a
[DP] Support api-server-count > 0 in hybrid DP LB mode (#21510)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-24 20:18:16 -07:00
Varun Sundar Rabindranath
2212cd6cfb
[Bugfix] DeepGemm utils : Fix hardcoded type-cast (#21517)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-24 20:17:29 -07:00
Burkhard Ringlein
ce3a9b1378
[Kernel] adding fused_moe configs for upcoming granite4 (#21332)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-24 20:16:59 -07:00
Yuxuan Zhang
2ce90e5b01
Fix GLM-4 PP Missing Layer When using with PP. (#21531)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-07-24 20:07:38 -07:00
Wentao Ye
633f6e804b
[Bug] Fix DeepGemm Init Error (#21554)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-24 20:07:22 -07:00
Harry Mellor
b57296bb9a
[Docs] Fix site_url for RunLLM (#21564)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 20:05:58 -07:00
Cyrus Leung
34ddcf9ff4
[Frontend] run-batch supports V1 (#21541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-24 20:05:55 -07:00
Woosuk Kwon
fe56180c7f
[MoE] More balanced expert sharding (#21497)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-07-24 15:56:08 -07:00
QiliangCui
07d80d7b0e
[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-24 15:33:04 -07:00
weiliang
2dd72d23d9
update flashinfer to v0.2.9rc1 (#21485)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
2025-07-24 14:06:11 -07:00
Simon Mo
a6c7fb8cff
[Docs] Add Expert Parallelism Initial Documentation (#21373)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 12:36:06 -07:00
Ricardo Decal
a7272c23d0
[Docs][minor] Fix broken gh-file link in distributed serving docs (#21543)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-24 10:36:56 -07:00
Juncheng Gu
6066284914
[P/D] Support CPU Transfer in NixlConnector (#18293)
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
2025-07-24 17:58:42 +01:00
Rui Qiao
1e9ea8e69d
[P/D] Move FakeNixlWrapper to test dir (#21328)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-24 08:53:45 -07:00
Chaojun Zhang
d9f9a3fd96
[XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform (#21036)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-07-24 23:23:36 +08:00
Shu Wang
1b25f1fe75
Update flashinfer CUTLASS MoE Kernel (#21408)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
2025-07-24 08:13:31 -07:00
Wentao Ye
e8cb0d0495
[Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access (#21465)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-24 08:13:24 -07:00