[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916)

Signed-off-by: Jin Huang <jinhun@amazon.com>
Co-authored-by: Jin Huang <jinhun@amazon.com>
This commit is contained in:
Jin Huang 2025-05-13 03:10:07 -04:00 committed by GitHub
parent f0d610a8ae
commit 8dd0671bac
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1107,7 +1107,7 @@ class GPUModelRunner(LoRAModelRunnerMixin):
else:
mm_embeds = []
if self.is_multimodal_model:
if self.is_multimodal_model and get_pp_group().is_first_rank:
# NOTE(woosuk): To unify token ids and soft tokens (vision
# embeddings), we always use embeddings (rather than token ids)
# as input to the multimodal model, even when the input is text.