7081 Commits

Author SHA1 Message Date
Lukas Geiger
d73a9457a5
[Core] Improve Tensor serialisation (#18774)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-28 09:46:21 +08:00
Luka Govedič
a3896c7f02
[Build] Fixes for CMake install (#18570) 2025-05-27 20:49:24 -04:00
cascade
51e98e4ffd
[Bugfix] Disable prefix caching by default for benchmark (#18771)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-28 08:18:09 +08:00
Michael Goin
e56f44d9ec
Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py (#18566) 2025-05-27 19:59:48 -04:00
Satyajith Chilappagari
e0cbad4e30
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
2025-05-27 22:10:33 +00:00
Carol Zheng
b48d5cca16
[CI/Build] [TPU] Fix TPU CI exit code (#18282)
Signed-off-by: Carol Zheng <cazheng@google.com>
2025-05-27 14:54:59 -07:00
Michael Goin
5873877241
[Bugfix] Mistral tool calling when content is list (#18729)
Signed-off-by: mgoin <mgoin64@gmail.com>
v0.9.0
2025-05-27 09:05:37 -07:00
Cyrus Leung
696259ca01
[Core] Automatically cast multi-modal input dtype (#18756)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-27 23:45:48 +08:00
chunxiaozheng
6b6d496114
optimize get_kv_cache_torch_dtype (#18531)
Signed-off-by: idellzheng <idellzheng@tencent.com>
2025-05-27 13:08:44 +00:00
cascade
aaa4ac1c95
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-27 20:06:34 +08:00
Mark McLoughlin
06a0338015
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-27 09:37:06 +00:00
Cyrus Leung
4318c0559d
[CI/Build] Remove imports of built-in re (#18750)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-27 09:19:18 +00:00
Hyogeun Oh (오효근)
a68e293cb9
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-27 01:44:20 -07:00
Shawn Huang
6881107948
[BUG FIX] minicpm (#18739)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
2025-05-27 01:04:49 -07:00
Kebe
e0f0ff87b8
[Build] fix cpu build missing libtbbmalloc.so (#18744)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-05-27 01:03:56 -07:00
maobaolong
c24b1572ac
Minor fix about MooncakeStoreConnector (#18721)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
2025-05-27 08:02:28 +00:00
Calvin Chen
4693a3438c
[Doc] cleanup deprecated flag for doc (#18715)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-27 07:12:02 +00:00
Łukasz Durejko
bbd9a84dc5
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (#18752)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
2025-05-27 00:10:26 -07:00
almersawi
a547aeb828
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
2025-05-27 00:07:53 -07:00
Reid
fc6d0c290f
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-27 07:07:01 +00:00
Cyrus Leung
753944fa9b
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-27 07:03:13 +00:00
Cyrus Leung
25a817f202
[Doc] Update OOT model docs (#18742)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-27 06:30:31 +00:00
vllmellm
d260f799a9
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-26 23:14:07 -07:00
Lukas Geiger
b50602d5f0
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-27 05:42:54 +00:00
Isotr0py
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-27 04:40:28 +00:00
Reid
1f88dbd2bb
[Misc] improve web section group title display (#18684)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-27 04:35:16 +00:00
Lukas Geiger
0eebd74842
[Model][Gemma3] Simplify image input validation (#18710)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-27 11:13:37 +08:00
Harry Mellor
27bebcd897
Convert examples to ruff-format (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-26 16:57:54 +00:00
Lukas Geiger
e7523c2e03
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608) 2025-05-26 11:49:36 -04:00
Cyrus Leung
a869baca73
[Bugfix] Fix Llama GGUF initialization (#18717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:49:22 -07:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Cyrus Leung
9553fdb41e
[Doc] Improve API docs (#18713)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:33:34 -07:00
dylan
243eb9199f
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (#18701) 2025-05-26 07:10:56 -07:00
Reid
0665e29998
[Misc] add AutoGen integration (#18712)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-26 13:56:18 +00:00
Łukasz Durejko
e76be06550
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (#18709)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
2025-05-26 05:26:07 -07:00
Isotr0py
0877750029
[CI/Build] Split pooling and generation extended language models tests in CI (#18705)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-26 04:00:08 -07:00
Naveassaf
6d68030f1c
[Model] Add support for YARN in NemotronNAS models (#18427)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-05-26 10:31:49 +00:00
Ning Xie
5a2c76cbe1
[CI] fix dump_input for str type (#18697)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-26 18:23:35 +08:00
Cyrus Leung
38b13dfe78
[CI/Build] Replace math.isclose with pytest.approx (#18703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 02:05:17 -07:00
Cyrus Leung
61a45e7a72
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 01:44:04 -07:00
Cyrus Leung
65523a0995
[Doc] Fix issue template format (#18699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 00:45:39 -07:00
Cyrus Leung
4b7740a105
[GH] Add issue template for reporting CI failures (#18696)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 00:42:04 -07:00
Ning Xie
4ea62c0ea0
[CI] add missing argument (#18694)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-26 00:22:04 -07:00
Maximilien de Bayser
561b77a0d6
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
2025-05-26 14:52:25 +08:00
CYJiang
abd4030d94
refactor: simplify request handler, use positive condition check for handler assignment (#18690)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-05-26 06:32:28 +00:00
AlexZhao
8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644)
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
2025-05-26 13:51:27 +08:00
Cyrus Leung
fba0642704
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage (#18683)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-25 20:27:50 -07:00
Lukas Geiger
6071e989df
[Core][Multimodal] Convert PIL Image to array without data copy when hashing (#18682)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-25 17:33:35 +00:00
Cyrus Leung
57fd13a707
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-25 14:05:30 +00:00
Reid
3a886bd58c
[Misc] small improve (#18680)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-25 06:05:38 -07:00