817 Commits

Author SHA1 Message Date
Harry Mellor
8781cd6b88
Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 17:02:10 +00:00
Martin Hickey
f4417f8449
[KVConnector] Add KV events to KV Connectors (#28309)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2025-12-11 15:30:29 +01:00
Wentao Ye
d6464f2679
[Chore] Fix torch precision warning (#30428)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 04:05:56 +00:00
shivampr
8580919ac3
[Bugfix] fix confusing OOM errors during v1 init (#28051)
Signed-off-by: Shivam <shivamprasad91@gmail.com>
Signed-off-by: shivampr <shivampr.dev@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-12-10 23:17:41 +00:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Andreas Karatzas
ed7af3178a
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 05:33:13 +00:00
Micah Williamson
7d80c73d42
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 02:35:49 +00:00
Lucas Wilkinson
abe93bce59
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-12-09 17:18:10 -08:00
Benjamin Chislett
e858bfe051
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-09 13:29:33 -05:00
Wentao Ye
83319b44c2
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-09 10:40:37 -05:00
Lucas Wilkinson
56037dfa2f
[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded (#30173)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-09 10:36:12 -05:00
quanliu
5dcd593baf
[Feature] Batch-Invariant Support for FA2 and LoRA (#30018)
Signed-off-by: quanliu <18646313696@163.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 10:01:38 -05:00
Hubert de La Jonquiere
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056) 2025-12-09 18:54:08 +08:00
Micah Williamson
aeb82b1930
[CI] Fix Flaky test_eagle_max_len Test (#30306)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-09 07:33:34 +00:00
Lucas Wilkinson
aed846917f
[Attention] Make split_decodes_and_prefills(..., require_uniform=True) support padding (#29644)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-12-09 07:24:01 +00:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors (#30201)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-08 20:46:09 -08:00
Wentao Ye
d9417096d1
[Feature] Batch invariant: Enable TRITON_MLA without prefix-caching (#29125)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-08 19:31:57 -05:00
Victor Ziliang Peng
f1599ca55d
feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189)
Signed-off-by: Ziliang Peng <ziliang@character.ai>
2025-12-09 00:08:48 +00:00
Ming Yang
60d17251c9
[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-09 00:01:08 +00:00
Charlie Fu
6af70e11a0
[ROCm][CI] Fix test_max_len.py for Rocm (#29916)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
2025-12-08 16:58:30 -05:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Cyrus Leung
671427efbf
[Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
rasmith
b12f4a9830
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-05 20:57:38 -08:00
Samuel Shen
7e31c3a3f6
[CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
2025-12-06 12:53:34 +08:00
Divakar Verma
962d703818
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-05 19:57:26 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Mark McLoughlin
dff0a2b394
[NIXL] Add remote_request_id to kv_transfer_params (#29665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 09:43:48 -08:00
Nick Hill
dc264bcea1
[BugFix] Eagerly abort cancelled final-step requests (#29987)
Currently, when requests are cancelled while executing their final
step, "completion" is handled based on normal stop processing (e.g.
length or stop token), so the abort has no effect. This is typically
not a problem, but when a kv connector is involved it thinks the
request completed successfully rather than being aborted.

This is problematic for disaggregated prefill which will free kv
cache blocks if the request was aborted but not if it completed
successfully—since the cancelled request will never be sent to
the decode side, kv cache blocks remain pinned until the fall-back
timeout expires. The problem is exacerbated when many requests
are cancelled and/or there are large prefills whose forward pass
takes a long time (since the window is bigger).

This PR fixes the problem by processing pending aborts
immediately prior to processing model output each step; we process
only aborts, not new requests, since it's preferable for latency to
process model outputs before new incoming requests.

Fixes #26400.

Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 17:28:32 +00:00
Nicolò Lucchesi
78c44fd722
[NIXL] Small cleanup of unused variables (#29618)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-05 18:17:36 +01:00
Mark McLoughlin
949a6a19d2
[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 15:52:45 +01:00
Alec S
65ee97288a
[BugFix] Adding env variable to disable async grammar compilation (#29996)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-12-05 00:49:37 -08:00
rasmith
feecba09af
[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-05 08:42:25 +00:00
Lucas Wilkinson
c8ab988b15
[BugFix] Fix DBO assert assert B_block_table == B_q (#29933)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-04 14:48:54 -05:00
Doug Smith
5b4b42c0b6
Mark DBO test as flaky on b200 for Distributed B200 test (#29913)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-12-04 10:38:03 -05:00
rasmith
f2f4cea6cc
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-04 09:30:22 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-04 16:22:03 +08:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2025-12-04 07:52:28 +00:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing (#29970)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Micah Williamson
d1f7392c5f
[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 01:17:07 +08:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163)
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
rasmith
5aa9b09040
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-03 22:56:35 +08:00
Micah Williamson
c014de1ec7
[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-02 22:54:36 +00:00
Chauncey
0a9caca9f5
[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-02 22:42:28 +00:00
Divakar Verma
afb1e5b380
[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-02 20:46:10 +00:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 12:16:37 +00:00
杰兮
48d15a32aa
[CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193)
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
2025-12-02 00:02:12 -08:00
Cyrus Leung
653591d5e7
[Chore] Move tokenizer initialization methods (#29793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 13:33:37 +08:00
Divakar Verma
e2fbfc955e
[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-02 05:27:46 +00:00
Divakar Verma
a690fb5bd6
[CI][ROCm] Fix test_correctness_sliding_window (#29243)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-02 04:53:27 +00:00