vllm/e2e at 43735bf5e19eaf243b6edaa5af4c7561a14fc2f6 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-27 11:41:50 +08:00

History

shangmingc b67ae00cdb

[Misc] Add quantization config support for speculative model. (#7343 )

2024-08-15 19:34:28 -07:00

..

__init__.py

…

conftest.py

[Misc] Deprecation Warning when setting --engine-use-ray (#7424 )

2024-08-14 09:44:27 -07:00

test_compatibility.py

…

test_integration_dist_tp2.py

[Model] RowParallelLinear: pass bias to quant_method.apply (#6327 )

2024-07-19 07:15:22 -06:00

test_integration_dist_tp4.py

[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369 )

2024-07-19 06:01:09 -07:00

test_integration.py

[Misc] Add quantization config support for speculative model. (#7343 )

2024-08-15 19:34:28 -07:00

test_logprobs.py

[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485 )

2024-07-20 23:58:58 -07:00

test_medusa_correctness.py

[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978 )

2024-07-09 18:34:02 -07:00

test_mlp_correctness.py

[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218 )

2024-08-08 22:08:46 -07:00

test_multistep_correctness.py

[Misc] Log spec decode metrics (#6454 )

2024-07-16 20:37:10 +00:00

test_ngram_correctness.py

…

test_seed.py

[BugFix] Fix use of per-request seed with pipeline parallel (#6698 )

2024-07-30 10:40:08 -07:00