vllm/offline_inference at 05a83dc6ee84be55fef73d5fa6a77fb56d2dd80f - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-10 09:54:31 +08:00

History

[New Model] BAGEL support (AR only) (#28439 )

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

2025-12-15 14:58:23 +08:00

basic

[ROCM][CI] Fix AMD Examples Test Group (#30276 )

2025-12-11 18:03:54 -05:00

disaggregated-prefill-v1

kv_transfer: Rename the shared storage connectors (#30201 )

2025-12-08 20:46:09 -08:00

kv_load_failure_recovery

kv_transfer: Rename the shared storage connectors (#30201 )

2025-12-08 20:46:09 -08:00

logits_processor

[Bugfix] Validate custom logits processor xargs for online serving (#27560 )

2025-11-05 16:53:33 +00:00

openai_batch

[Doc] ruff format remaining Python examples (#26795 )

2025-10-15 01:25:49 -07:00

qwen2_5_omni

[Deprecation] Remove deprecated task, seed and MM settings (#30397 )

2025-12-10 19:59:39 -08:00

qwen3_omni

[Deprecation] Remove deprecated task, seed and MM settings (#30397 )

2025-12-10 19:59:39 -08:00

async_llm_streaming.py

…

audio_language.py

Add AudioFlamingo3 model support (#30539 )

2025-12-14 02:14:55 -08:00

automatic_prefix_caching.py

…

batch_llm_inference.py

…

chat_with_tools.py

[Doc]: fix typos in Python comments (#24417 )

2025-09-08 00:22:16 -07:00

context_extension.py

Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542 )

2025-11-19 09:06:36 -08:00

data_parallel.py

[CI/Build] Use spawn subprocess for ROCm (#30272 )

2025-12-12 03:33:17 +00:00

disaggregated_prefill.py

Remove deprecated PyNcclConnector (#24151 )

2025-09-03 22:49:16 +00:00

encoder_decoder_multimodal.py

[Deprecation] Remove deprecated task, seed and MM settings (#30397 )

2025-12-10 19:59:39 -08:00

llm_engine_example.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

llm_engine_reset_kv.py

[Core] Support reseting all running requests' KV while calling reset_prefix_cache (#28827 )

2025-12-02 02:25:05 +00:00

load_sharded_state.py

[Misc] fix typo and add detailed log (#28178 )

2025-11-09 05:33:46 +00:00

lora_with_quantization_inference.py

fix LoRA-related examples (#29956 )

2025-12-04 11:48:30 +08:00

metrics.py

…

mistral-small.py

…

mlpspeculator.py

[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204 )

2025-11-11 18:22:16 -07:00

multilora_inference.py

fix LoRA-related examples (#29956 )

2025-12-04 11:48:30 +08:00

prefix_caching.py

…

prompt_embed_inference.py

…

qwen_1m.py

Remove V0 attention backends (#25351 )

2025-09-21 16:03:28 -07:00

reproducibility.py

[Doc] Update more docs with respect to V1 (#29188 )

2025-11-23 10:58:48 +08:00

rlhf_colocate.py

[RL] fast weight update with zmq + ipc handles (#24295 )

2025-09-09 16:57:46 +08:00

rlhf_online_quant.py

Move online quantization to model.load_weights (#26327 )

2025-11-18 16:52:41 -08:00

rlhf_utils.py

[V0 deprecation] Remove more V0 references (#29088 )

2025-11-21 11:56:59 +00:00

rlhf.py

Move online quantization to model.load_weights (#26327 )

2025-11-18 16:52:41 -08:00

save_sharded_state.py

[V0 deprecation] Remove more V0 references (#29088 )

2025-11-21 11:56:59 +00:00

simple_profiling.py

[Cleanup] Refactor profiling env vars into a CLI config (#29912 )

2025-12-09 13:29:33 -05:00

skip_loading_weights_in_engine_init.py

…

spec_decode.py

Attempt to fix GPU OOM in a spec-decoding test (#29419 )

2025-11-25 14:23:36 -05:00

structured_outputs.py

[Chore] Cleanup guided namespace, move to structured outputs config (#22772 )

2025-09-18 09:20:27 +00:00

torchrun_dp_example.py

[CI/Build] Test torchrun with 8 cards (#27548 )

2025-10-29 10:26:06 -07:00

torchrun_example.py

…

vision_language_multi_image.py

[Deprecation] Remove deprecated task, seed and MM settings (#30397 )

2025-12-10 19:59:39 -08:00

vision_language.py

[New Model] BAGEL support (AR only) (#28439 )

2025-12-15 14:58:23 +08:00