vllm/offline_inference at c4c0354eec86f3486285f121fa184dd6d9cacb9d - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 04:37:11 +08:00

History

Attempt to fix GPU OOM in a spec-decoding test (#29419 )

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

2025-11-25 14:23:36 -05:00

basic

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

disaggregated-prefill-v1

[Docs] Switch to better markdown linting pre-commit hook (#21851 )

2025-07-29 19:45:08 -07:00

kv_load_failure_recovery

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

logits_processor

[Bugfix] Validate custom logits processor xargs for online serving (#27560 )

2025-11-05 16:53:33 +00:00

openai_batch

[Doc] ruff format remaining Python examples (#26795 )

2025-10-15 01:25:49 -07:00

pooling

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524 )

2025-10-30 12:13:05 +00:00

qwen2_5_omni

[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204 )

2025-11-11 18:22:16 -07:00

qwen3_omni

[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721 )

2025-11-24 19:24:37 +00:00

async_llm_streaming.py

[Example] Add async_llm_streaming.py example for AsyncLLM streaming in python (#21763 )

2025-07-30 18:39:46 -06:00

audio_language.py

Add TP CLI argument to multimodal inference examples (#29301 )

2025-11-25 06:03:20 +00:00

automatic_prefix_caching.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

batch_llm_inference.py

[Docs] Improve docstring for ray data llm example (#20597 )

2025-07-07 20:06:26 -07:00

chat_with_tools.py

[Doc]: fix typos in Python comments (#24417 )

2025-09-08 00:22:16 -07:00

context_extension.py

Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542 )

2025-11-19 09:06:36 -08:00

data_parallel.py

[Chore] Separate out vllm.utils.network_utils (#27164 )

2025-10-19 03:06:32 -07:00

disaggregated_prefill.py

Remove deprecated PyNcclConnector (#24151 )

2025-09-03 22:49:16 +00:00

encoder_decoder_multimodal.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

llm_engine_example.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

load_sharded_state.py

[Misc] fix typo and add detailed log (#28178 )

2025-11-09 05:33:46 +00:00

lora_with_quantization_inference.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

metrics.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

mistral-small.py

[Frontend] Use engine argument to control MM cache size (#22441 )

2025-08-07 09:47:10 -07:00

mlpspeculator.py

[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204 )

2025-11-11 18:22:16 -07:00

multilora_inference.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

prefix_caching.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

prompt_embed_inference.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

qwen_1m.py

Remove V0 attention backends (#25351 )

2025-09-21 16:03:28 -07:00

reproducibility.py

[Doc] Update more docs with respect to V1 (#29188 )

2025-11-23 10:58:48 +08:00

rlhf_colocate.py

[RL] fast weight update with zmq + ipc handles (#24295 )

2025-09-09 16:57:46 +08:00

rlhf_online_quant.py

Move online quantization to model.load_weights (#26327 )

2025-11-18 16:52:41 -08:00

rlhf_utils.py

[V0 deprecation] Remove more V0 references (#29088 )

2025-11-21 11:56:59 +00:00

rlhf.py

Move online quantization to model.load_weights (#26327 )

2025-11-18 16:52:41 -08:00

save_sharded_state.py

[V0 deprecation] Remove more V0 references (#29088 )

2025-11-21 11:56:59 +00:00

simple_profiling.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

skip_loading_weights_in_engine_init.py

[Doc] Add inplace weights loading example (#19640 )

2025-07-17 21:12:23 -07:00

spec_decode.py

Attempt to fix GPU OOM in a spec-decoding test (#29419 )

2025-11-25 14:23:36 -05:00

structured_outputs.py

[Chore] Cleanup guided namespace, move to structured outputs config (#22772 )

2025-09-18 09:20:27 +00:00

torchrun_dp_example.py

[CI/Build] Test torchrun with 8 cards (#27548 )

2025-10-29 10:26:06 -07:00

torchrun_example.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

vision_language_multi_image.py

Add TP CLI argument to multimodal inference examples (#29301 )

2025-11-25 06:03:20 +00:00

vision_language_pooling.py

[chore] Move the rest of wikimedia url to S3 (#28921 )

2025-11-18 09:44:18 -08:00

vision_language.py

Add TP CLI argument to multimodal inference examples (#29301 )

2025-11-25 06:03:20 +00:00