vllm/offline_inference at c42fe0b63a29d3ec157089c9784643000dde4aec - vllm

basic

[Frontend] Add LLM.reward specific to reward models (#21720 )

2025-07-29 20:56:03 -07:00

disaggregated-prefill-v1

[Docs] Switch to better markdown linting pre-commit hook (#21851 )

2025-07-29 19:45:08 -07:00

openai_batch

[Docs] Switch to better markdown linting pre-commit hook (#21851 )

2025-07-29 19:45:08 -07:00

profiling_tpu

[Misc] small update (#20462 )

2025-07-03 20:33:44 -07:00

qwen2_5_omni

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

async_llm_streaming.py

[Example] Add async_llm_streaming.py example for AsyncLLM streaming in python (#21763 )

2025-07-30 18:39:46 -06:00

audio_language.py

[Model] Gemma3n MM (#20495 )

2025-08-09 09:56:25 -07:00

automatic_prefix_caching.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

batch_llm_inference.py

[Docs] Improve docstring for ray data llm example (#20597 )

2025-07-07 20:06:26 -07:00

chat_with_tools.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

context_extension.py

[Misc] refactor context extension (#19246 )

2025-06-07 05:13:21 +00:00

convert_model_to_seq_cls.py

[Model][Last/4] Automatic conversion of CrossEncoding model (#19675 )

2025-07-07 14:46:04 +00:00

data_parallel.py

fix ci issue distributed 4 gpu test (#20204 )

2025-06-27 22:50:00 -07:00

disaggregated_prefill.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

embed_jina_embeddings_v3.py

[Deprecation][2/N] Replace --task with --runner and --convert (#21470 )

2025-07-27 19:42:40 -07:00

embed_matryoshka_fy.py

[Deprecation][2/N] Replace --task with --runner and --convert (#21470 )

2025-07-27 19:42:40 -07:00

encoder_decoder_multimodal.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

encoder_decoder.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

llm_engine_example.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

load_sharded_state.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

lora_with_quantization_inference.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

metrics.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

mistral-small.py

[Frontend] Use engine argument to control MM cache size (#22441 )

2025-08-07 09:47:10 -07:00

mlpspeculator.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

multilora_inference.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

neuron_eagle.py

[bugfix] fix syntax warning caused by backslash (#21251 )

2025-07-20 17:12:10 +00:00

neuron_int8_quantization.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

neuron_multimodal.py

[Misc] refactor neuron_multimodal and profiling (#19397 )

2025-06-10 06:12:42 +00:00

neuron_speculation.py

[Misc] Remove deprecated args in v0.10 (#21349 )

2025-07-22 05:26:39 -07:00

neuron.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

prefix_caching.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

prithvi_geospatial_mae.py

Support encoder-only models without KV-Cache (#21270 )

2025-07-26 21:09:52 +08:00

profiling.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

prompt_embed_inference.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

qwen3_reranker.py

[Deprecation][2/N] Replace --task with --runner and --convert (#21470 )

2025-07-27 19:42:40 -07:00

qwen_1m.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

reproducibility.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

rlhf_colocate.py

[Docs] Improve docs for RLHF co-location example (#20599 )

2025-07-09 08:06:43 -07:00

rlhf_utils.py

[RLHF] Fix torch.dtype not serializable in example (#22158 )

2025-08-04 02:43:33 +00:00

rlhf.py

[RLHF] Fix torch.dtype not serializable in example (#22158 )

2025-08-04 02:43:33 +00:00

save_sharded_state.py

[Bugfix] fix max-file-size type from str to int (#21675 )

2025-07-28 00:06:52 -07:00

simple_profiling.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

skip_loading_weights_in_engine_init.py

[Doc] Add inplace weights loading example (#19640 )

2025-07-17 21:12:23 -07:00

spec_decode.py

[Meta] Official Eagle mm support, first enablement on llama4 (#20788 )

2025-07-31 10:35:07 -07:00

structured_outputs.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

torchrun_example.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

tpu.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

vision_language_multi_image.py

[New Model] Support Command-A-Vision (#22660 )

2025-08-12 01:39:54 -07:00

vision_language_pooling.py

[Deprecation][2/N] Replace --task with --runner and --convert (#21470 )

2025-07-27 19:42:40 -07:00

vision_language.py

[New Model] Support Command-A-Vision (#22660 )

2025-08-12 01:39:54 -07:00