Lukas Geiger
|
d73a9457a5
|
[Core] Improve Tensor serialisation (#18774)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-28 09:46:21 +08:00 |
|
Luka Govedič
|
a3896c7f02
|
[Build] Fixes for CMake install (#18570)
|
2025-05-27 20:49:24 -04:00 |
|
cascade
|
51e98e4ffd
|
[Bugfix] Disable prefix caching by default for benchmark (#18771)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-28 08:18:09 +08:00 |
|
Michael Goin
|
e56f44d9ec
|
Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py (#18566)
|
2025-05-27 19:59:48 -04:00 |
|
Satyajith Chilappagari
|
e0cbad4e30
|
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-27 22:10:33 +00:00 |
|
Carol Zheng
|
b48d5cca16
|
[CI/Build] [TPU] Fix TPU CI exit code (#18282)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-27 14:54:59 -07:00 |
|
Michael Goin
|
5873877241
|
[Bugfix] Mistral tool calling when content is list (#18729)
Signed-off-by: mgoin <mgoin64@gmail.com>
v0.9.0
|
2025-05-27 09:05:37 -07:00 |
|
Cyrus Leung
|
696259ca01
|
[Core] Automatically cast multi-modal input dtype (#18756)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 23:45:48 +08:00 |
|
chunxiaozheng
|
6b6d496114
|
optimize get_kv_cache_torch_dtype (#18531)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-05-27 13:08:44 +00:00 |
|
cascade
|
aaa4ac1c95
|
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-27 20:06:34 +08:00 |
|
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
|
Cyrus Leung
|
4318c0559d
|
[CI/Build] Remove imports of built-in re (#18750)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 09:19:18 +00:00 |
|
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
|
Shawn Huang
|
6881107948
|
[BUG FIX] minicpm (#18739)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-27 01:04:49 -07:00 |
|
Kebe
|
e0f0ff87b8
|
[Build] fix cpu build missing libtbbmalloc.so (#18744)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-27 01:03:56 -07:00 |
|
maobaolong
|
c24b1572ac
|
Minor fix about MooncakeStoreConnector (#18721)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2025-05-27 08:02:28 +00:00 |
|
Calvin Chen
|
4693a3438c
|
[Doc] cleanup deprecated flag for doc (#18715)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-27 07:12:02 +00:00 |
|
Łukasz Durejko
|
bbd9a84dc5
|
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (#18752)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
|
2025-05-27 00:10:26 -07:00 |
|
almersawi
|
a547aeb828
|
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
|
2025-05-27 00:07:53 -07:00 |
|
Reid
|
fc6d0c290f
|
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 07:07:01 +00:00 |
|
Cyrus Leung
|
753944fa9b
|
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 07:03:13 +00:00 |
|
Cyrus Leung
|
25a817f202
|
[Doc] Update OOT model docs (#18742)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 06:30:31 +00:00 |
|
vllmellm
|
d260f799a9
|
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-26 23:14:07 -07:00 |
|
Lukas Geiger
|
b50602d5f0
|
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 05:42:54 +00:00 |
|
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
|
Reid
|
1f88dbd2bb
|
[Misc] improve web section group title display (#18684)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-27 04:35:16 +00:00 |
|
Lukas Geiger
|
0eebd74842
|
[Model][Gemma3] Simplify image input validation (#18710)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 11:13:37 +08:00 |
|
Harry Mellor
|
27bebcd897
|
Convert examples to ruff-format (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-26 16:57:54 +00:00 |
|
Lukas Geiger
|
e7523c2e03
|
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608)
|
2025-05-26 11:49:36 -04:00 |
|
Cyrus Leung
|
a869baca73
|
[Bugfix] Fix Llama GGUF initialization (#18717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:49:22 -07:00 |
|
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
|
Cyrus Leung
|
9553fdb41e
|
[Doc] Improve API docs (#18713)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:33:34 -07:00 |
|
dylan
|
243eb9199f
|
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (#18701)
|
2025-05-26 07:10:56 -07:00 |
|
Reid
|
0665e29998
|
[Misc] add AutoGen integration (#18712)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-26 13:56:18 +00:00 |
|
Łukasz Durejko
|
e76be06550
|
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (#18709)
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
|
2025-05-26 05:26:07 -07:00 |
|
Isotr0py
|
0877750029
|
[CI/Build] Split pooling and generation extended language models tests in CI (#18705)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-26 04:00:08 -07:00 |
|
Naveassaf
|
6d68030f1c
|
[Model] Add support for YARN in NemotronNAS models (#18427)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-05-26 10:31:49 +00:00 |
|
Ning Xie
|
5a2c76cbe1
|
[CI] fix dump_input for str type (#18697)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-26 18:23:35 +08:00 |
|
Cyrus Leung
|
38b13dfe78
|
[CI/Build] Replace math.isclose with pytest.approx (#18703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 02:05:17 -07:00 |
|
Cyrus Leung
|
61a45e7a72
|
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 01:44:04 -07:00 |
|
Cyrus Leung
|
65523a0995
|
[Doc] Fix issue template format (#18699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 00:45:39 -07:00 |
|
Cyrus Leung
|
4b7740a105
|
[GH] Add issue template for reporting CI failures (#18696)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 00:42:04 -07:00 |
|
Ning Xie
|
4ea62c0ea0
|
[CI] add missing argument (#18694)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-26 00:22:04 -07:00 |
|
Maximilien de Bayser
|
561b77a0d6
|
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
|
2025-05-26 14:52:25 +08:00 |
|
CYJiang
|
abd4030d94
|
refactor: simplify request handler, use positive condition check for handler assignment (#18690)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-05-26 06:32:28 +00:00 |
|
AlexZhao
|
8820821b59
|
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644)
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
|
2025-05-26 13:51:27 +08:00 |
|
Cyrus Leung
|
fba0642704
|
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage (#18683)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-25 20:27:50 -07:00 |
|
Lukas Geiger
|
6071e989df
|
[Core][Multimodal] Convert PIL Image to array without data copy when hashing (#18682)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-25 17:33:35 +00:00 |
|
Cyrus Leung
|
57fd13a707
|
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-25 14:05:30 +00:00 |
|
Reid
|
3a886bd58c
|
[Misc] small improve (#18680)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-25 06:05:38 -07:00 |
|