7855 Commits

Author SHA1 Message Date
Robert Shaw
896b0a271e updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 15:56:59 +00:00
Robert Shaw
fd0650f258 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 15:56:50 +00:00
Robert Shaw
cad9670547 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 15:23:30 +00:00
Robert Shaw
3956d8ccad updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 15:22:43 +00:00
Robert Shaw
9a2e26d049 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 14:44:54 +00:00
Robert Shaw
d39cf9380d updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 14:42:48 +00:00
Robert Shaw
e08e1e99ee cleanup prometheus logging
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 14:41:55 +00:00
Robert Shaw
de91a3cd6a convert to use only one prometheus stat logger per async llm
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 14:38:45 +00:00
Robert Shaw
a69edca369 convert to use only one prometheus stat logger per async llm
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 13:52:50 +00:00
Robert Shaw
1e5303a801 stash
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 13:37:34 +00:00
Robert Shaw
6569facd3b stash
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 13:34:38 +00:00
Robert Shaw
471fa4ae68 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:54:43 +00:00
Robert Shaw
dbc51d6e98 nits
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:48:11 +00:00
Robert Shaw
b9c0f658ca nits
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:47:45 +00:00
Robert Shaw
1ced153eec updatedd
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:47:23 +00:00
Robert Shaw
2a68433a82 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:45:48 +00:00
Robert Shaw
4438796b48 fix lb issues
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 03:44:38 +00:00
Robert Shaw
d2d54e9c72 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:46:54 +00:00
Robert Shaw
e1843b7e6c updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:30:23 +00:00
Robert Shaw
b142571366 cleanup
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:24:49 +00:00
Robert Shaw
2aa497571d updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:23:34 +00:00
Robert Shaw
14db6606f2 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:21:51 +00:00
Robert Shaw
4f5d3eabc8 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:21:19 +00:00
Robert Shaw
14cf3c4786 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:20:54 +00:00
Robert Shaw
2fd05875d4 updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:20:36 +00:00
Robert Shaw
48cf09be0b updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:19:52 +00:00
Robert Shaw
59a958362f updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:19:30 +00:00
Robert Shaw
aefeeed64d updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:18:01 +00:00
Robert Shaw
b90d33163c updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-20 02:15:19 +00:00
Robert Shaw
14f13ed690 added debug logging
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-19 16:27:38 +00:00
kourosh hakhamaneshi
9f414a12ad
[BugFix] Make PD work with Ray (#21072)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-19 08:46:50 -07:00
Jiayi Yan
6a971ed692
[Docs] Update the link to the 'Prometheus/Grafana' example (#21225) 2025-07-19 06:58:07 -07:00
Sungjae Lee
da6579bf41
[CI/CD][bugfix]fix: error argument to loads has incompatible type (#21223)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
2025-07-19 05:16:48 -07:00
Rabi Mishra
c81259d33a
Fix/remove some broken model executor tests (#21224)
Signed-off-by: Rabi Mishra <ramishra@redhat.com>
2025-07-19 12:15:07 +00:00
Li, Jiang
e3a0e43d7f
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-19 05:13:55 -07:00
22quinn
b3d82108e7
[Bugfix][Frontend] Fix openai CLI arg middleware (#21220)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-19 02:40:38 -07:00
Kaixi Hou
6d0734c562
[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency (#20645)
Signed-off-by: kaixih <kaixih@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-19 02:33:01 -07:00
shixianc
7d94577138
Add torch golden impl for moe_align_block_size kernel test (#20653)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-19 02:32:36 -07:00
Lucas Wilkinson
59f935300c
[BugFix] Fix potential cuda-graph IMA (#21196)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-19 02:18:47 -07:00
Isotr0py
18e519ec86
[Bugfix] Fix ndarray video color from VideoAsset (#21064)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-19 02:17:16 -07:00
Jee Jee Li
1eaff27815
[V0 deprecation] Remove long context LoRA (#21169)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-19 02:15:41 -07:00
Huy Do
cf8cc32674
Fix a couple of Voxtral tests (#21218)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-07-19 09:13:41 +00:00
Chenyaaang
3a2cb2649d
[Misc][Tools][Benchmark] Add readme file for auto_tune script (#20779)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-07-19 09:06:59 +00:00
김종곤
3e04107d97
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
2025-07-19 14:25:44 +08:00
Wentao Ye
37bd8d6e4c
[Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 (#21187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-18 23:25:22 -07:00
Lucas Wilkinson
468e2400fe
[BugFix][CPU] Fix TorchSDPABackendImpl doesn't have use_irope (#21200)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-18 23:18:48 -07:00
Varun Sundar Rabindranath
dcc6cfb991
[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel (#21193)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-18 23:09:51 -07:00
Woosuk Kwon
dd572c0ab3
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-18 21:47:50 -07:00
Varun Sundar Rabindranath
9ffe905a41
[Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 (#21183)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-07-18 21:15:03 -07:00
Lucia Fang
9a9fda1423
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
2025-07-18 20:48:38 -07:00