vllm/offline_inference at 418d2f8bfb5593bce89641d79849900f7294b859 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 01:47:16 +08:00

History

[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 )

Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2025-05-14 12:31:46 -07:00

basic

Fix and simplify deprecated=True CLI kwarg (#17781 )

2025-05-07 16:51:06 +01:00

disaggregated-prefill-v1

Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI (#17994 )

2025-05-12 11:25:33 -07:00

openai

[CI/Build] Auto-fix Markdown files (#12941 )

2025-02-08 04:25:15 -08:00

profiling_tpu

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

qwen2_5_omni

[Misc] Rename assets for testing (#17575 )

2025-05-02 03:29:25 -07:00

audio_language.py

[Model] Add Granite Speech Support (#16246 )

2025-04-28 10:05:00 +00:00

batch_llm_inference.py

[Ray] Improve documentation on batch inference (#16609 )

2025-04-16 22:19:26 -07:00

chat_with_tools.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

data_parallel.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

disaggregated_prefill.py

Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI (#17994 )

2025-05-12 11:25:33 -07:00

eagle.py

[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 )

2025-05-14 12:31:46 -07:00

embed_jina_embeddings_v3.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

embed_matryoshka_fy.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

encoder_decoder_multimodal.py

[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 )

2025-04-17 04:17:39 +00:00

encoder_decoder.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

llm_engine_example.py

[Misc] refactor examples series (#16708 )

2025-04-16 10:16:36 +00:00

load_sharded_state.py

[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367 )

2025-04-03 07:32:10 +00:00

lora_with_quantization_inference.py

[Misc] Remove qlora_adapter_name_or_path (#17699 )

2025-05-06 23:10:37 -07:00

mistral-small.py

[VLM] Clean up models (#16873 )

2025-04-19 12:13:06 +00:00

mlpspeculator.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

multilora_inference.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

neuron_eagle.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

neuron_int8_quantization.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

neuron_speculation.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

neuron.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

prefix_caching.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

prithvi_geospatial_mae.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

profiling.py

Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )

2025-05-13 23:19:14 -07:00

qwen_1m.py

Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )

2025-05-12 19:52:47 -07:00

reproducibility.py

[Doc] Fix a typo in the file name (#17836 )

2025-05-08 18:04:18 +08:00

rlhf_colocate.py

[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 )

2025-03-07 00:32:46 +08:00

rlhf_utils.py

[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 )

2025-03-07 00:32:46 +08:00

rlhf.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

save_sharded_state.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

simple_profiling.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

structured_outputs.py

[Misc] refactor Structured Outputs example (#16322 )

2025-04-09 23:32:42 +00:00

torchrun_example.py

[Misc] format and refactor some examples (#16252 )

2025-04-08 10:42:32 +00:00

tpu.py

[TPU] Increase block size and reset block shapes (#16458 )

2025-05-06 13:55:04 -04:00

vision_language_embedding.py

[Misc] refactor argument parsing in examples (#16635 )

2025-04-15 08:05:30 +00:00

vision_language_multi_image.py

[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 )

2025-05-11 17:56:30 -07:00

vision_language.py

[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 )

2025-05-11 17:56:30 -07:00