11 Commits

Author SHA1 Message Date
Matthew Bonanni
1d93f11675
[Attention][CUDAGraph] Remove CG padding from attention backends (#29352)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-02 13:48:08 -05:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Paul Pak
5f42fc53b6
[backends][short_conv] CUDA graph piecewise edits (#24215)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-03 12:59:48 +00:00
Benjamin Chislett
c30b405b8f
[Spec Decode] Enable FlashInfer Spec Decoding (#25196)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
2025-09-23 22:29:58 -04:00
Thomas Parnell
a903669e10
[V1] Remove V0 code paths for Hybrid models (#25400)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-23 08:26:13 -07:00
Vadim Gimpelson
072d7e53e5
[PERF] Add conv1d metadata to GDN attn (#25105)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-18 14:27:49 +00:00
Russell Bryant
37e8182bfe
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
2025-09-10 13:53:35 -07:00
Lucas Wilkinson
402759d472
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
Paul Pak
2e2000f352
[Model] Add LFM2 architecture (#22845)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-08-21 09:35:07 +02:00