8 Commits

Author SHA1 Message Date
Yong Hoon Shin
cb293f6a79
[V1] Enable prefill optimization for Gemma3n (#22628)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-28 14:54:30 -07:00
Harry Mellor
c49848396d
Refactor sliding window configuration to Transformers best practice (#21927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00
Nicolò Lucchesi
5a16fa614c
[Model] Gemma3n MM (#20495)
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-09 09:56:25 -07:00
Kyle Sayers
0f46a780d4
[Model] [Quantization] Support quantization for Gemma3n (#21974)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-31 22:45:15 -07:00
Yong Hoon Shin
ad510309ee
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-30 08:54:15 -07:00
Areeb Syed
fdde18229e
[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808)
Signed-off-by: sydarb <areebsyed237@gmail.com>
2025-07-30 11:35:21 +08:00
Yong Hoon Shin
9266d98048
[BugFix] Fix interleaved sliding window not set for Gemma3n (#21863)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-29 16:34:19 -07:00
Robert Shaw
d1c956dc0f
Gemma3n (Text-only) (#20134)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-06-27 07:16:26 +00:00