[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566)

Signed-off-by: Isotr0py <2037008807@qq.com>
This commit is contained in:
Isotr0py 2024-12-28 03:45:13 +08:00 committed by GitHub
parent 0240402c46
commit dde1fa18c9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1001,8 +1001,11 @@ class BitsAndBytesModelLoader(BaseModelLoader):
for sub_name in sub_modules:
self.target_modules.append(
name.replace(last_name, sub_name))
else:
self.target_modules.append(name)
# Add original module name even if the module has stacked map,
# in case model has a mixture of disk-merged and disk-splitted
# weights with same last name.
self.target_modules.append(name)
assert (self.target_modules
), "vllm currently does not support BNB quantization for"
f" {type(model).__name__}"