[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111)

Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This commit is contained in:
Huamin Li 2025-11-20 23:53:30 -08:00 committed by GitHub
parent 7d6da483b0
commit 8ac3a41487
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -166,10 +166,12 @@ class Gemma3Attention(nn.Module):
else:
# Transformers v4 rope config.
# Global attention. Use the values in config.json.
rope_parameters = config.rope_parameters.copy()
rope_parameters = config.rope_parameters
# Local attention. Override the values in config.json.
if self.is_sliding:
rope_parameters["rope_theta"] = config.rope_local_base_freq
rope_parameters = dict(
rope_type="default", rope_theta=config.rope_local_base_freq
)
self.rotary_emb = get_rope(
self.head_dim,