5 Commits

Author SHA1 Message Date
Jonas M. Kübler
f98d4d38c0 [Bug] fix import and unit test (#25558)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Benjamin Chislett
177c37e960 [Spec Decode] Enable FlashInfer Spec Decoding (#25196)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Lucas Wilkinson
8c5c35c027 [Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Sage Moore
0edaf752d7
[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-08-01 19:47:53 -07:00