mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-04-16 06:57:03 +08:00
`TP_SIZE=1 DP_SIZE=2 pytest -v -s tests/v1/distributed/test_eagle_dp.py` fails on A100 for me before this PR. Here's what I think is happening: - the test is checking that the tokens produced by a model with eagle is identical to a model without eagle - the model with eagle uses a draft model to produce draft tokens - the target model takes all of the draft tokens and then does a forward pass to see how many of the tokens to accept/reject. The target model is using a batch_size > 1. - the model without eagle just generates the tokens one-by-one, that is, it has batch_size = 1. - For these two models to be *consistent*, we need batch invariance. So I turned on batch invariance (which also required the selection of an attention backend) Signed-off-by: Richard Zou <zou3519@gmail.com>