fix: preserve original tokenizer class name for HuggingFace compatibility

Address review feedback: correctly set CachedTokenizer.__name__ to the
original tokenizer's class name instead of 'CachedTokenizer'.

This ensures transformers processors can validate the tokenizer type
by checking __name__, which previously failed validation.

Fixes #31080

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: yurekami <yurekami@users.noreply.github.com>
This commit is contained in:
yurekami 2025-12-25 03:24:05 +09:00
parent 09dc7c690c
commit 4845b90d9b

View File

@ -58,7 +58,9 @@ def get_cached_tokenizer(tokenizer: HfTokenizer) -> HfTokenizer:
def __reduce__(self):
return get_cached_tokenizer, (tokenizer,)
CachedTokenizer.__name__ = f"Cached{tokenizer.__class__.__name__}"
# Preserve original class name for HuggingFace compatibility.
# Some processors validate tokenizer type by checking __name__.
CachedTokenizer.__name__ = tokenizer.__class__.__name__
cached_tokenizer.__class__ = CachedTokenizer
return cached_tokenizer