This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-03-20 18:23:36 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
examples
/
offline_inference
History
Eldar Kurtić
c32a18cbe7
Attempt to fix GPU OOM in a spec-decoding test (
#29419
)
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-25 14:23:36 -05:00
..
basic
…
disaggregated-prefill-v1
…
kv_load_failure_recovery
…
logits_processor
…
openai_batch
…
pooling
…
qwen2_5_omni
…
qwen3_omni
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (
#27721
)
2025-11-24 19:24:37 +00:00
async_llm_streaming.py
…
audio_language.py
Add TP CLI argument to multimodal inference examples (
#29301
)
2025-11-25 06:03:20 +00:00
automatic_prefix_caching.py
…
batch_llm_inference.py
…
chat_with_tools.py
…
context_extension.py
…
data_parallel.py
…
disaggregated_prefill.py
…
encoder_decoder_multimodal.py
…
llm_engine_example.py
…
load_sharded_state.py
…
lora_with_quantization_inference.py
…
metrics.py
…
mistral-small.py
…
mlpspeculator.py
…
multilora_inference.py
…
prefix_caching.py
…
prompt_embed_inference.py
…
qwen_1m.py
…
reproducibility.py
…
rlhf_colocate.py
…
rlhf_online_quant.py
…
rlhf_utils.py
…
rlhf.py
…
save_sharded_state.py
…
simple_profiling.py
…
skip_loading_weights_in_engine_init.py
…
spec_decode.py
Attempt to fix GPU OOM in a spec-decoding test (
#29419
)
2025-11-25 14:23:36 -05:00
structured_outputs.py
…
torchrun_dp_example.py
…
torchrun_example.py
…
vision_language_multi_image.py
Add TP CLI argument to multimodal inference examples (
#29301
)
2025-11-25 06:03:20 +00:00
vision_language_pooling.py
…
vision_language.py
Add TP CLI argument to multimodal inference examples (
#29301
)
2025-11-25 06:03:20 +00:00