`TP_SIZE=1 DP_SIZE=2 pytest -v -s tests/v1/distributed/test_eagle_dp.py` fails
on A100 for me before this PR.
Here's what I think is happening:
- the test is checking that the tokens produced by a model with eagle is
identical to a model without eagle
- the model with eagle uses a draft model to produce draft tokens
- the target model takes all of the draft tokens and then does a forward
pass to see how many of the tokens to accept/reject. The target model
is using a batch_size > 1.
- the model without eagle just generates the tokens one-by-one, that is,
it has batch_size = 1.
- For these two models to be *consistent*, we need batch invariance. So
I turned on batch invariance (which also required the selection of an
attention backend)
Signed-off-by: Richard Zou <zou3519@gmail.com>