Xiangyu Li
|
5cc6bddb6e
|
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092)
|
2025-10-23 23:26:13 -04:00 |
|
Harry Mellor
|
1f9460c4c1
|
Fix pooling adapters for Transformers backend (#27338)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-23 20:23:55 -07:00 |
|
xiao-llm
|
70022ffc00
|
Granite 4.0 quark quantization support (#26944)
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com>
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>
|
2025-10-24 02:14:03 +00:00 |
|
Akash kaothalkar
|
f417746ad7
|
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc (#27422)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
|
2025-10-23 21:21:36 +00:00 |
|
Yu Jiaqi
|
0552cfb195
|
[Model] Siglip Embedding Support (#27324)
Signed-off-by: piood <2477084691@qq.com>
|
2025-10-23 20:19:48 +00:00 |
|
Kebe
|
51dd14ac2b
|
[Bugfix][DP] Fix creating too many DP Placement Groups (#26880)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-10-23 20:16:51 +00:00 |
|
Matthew Bonanni
|
dbfbf9f324
|
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 (#27368)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-23 15:58:15 -04:00 |
|
Jonathan Chen
|
ca76486a16
|
[Chore] Separate out vllm.utils.platform_utils.py (#27374)
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
|
2025-10-23 19:08:06 +00:00 |
|
Varun Sundar Rabindranath
|
a9f55dc588
|
[Misc] Add triton_kernels dependency (#27370)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-23 12:04:14 -07:00 |
|
Isotr0py
|
81d5bb765a
|
[Bugfix] Fix AWQ marlin layer skipping (#27416)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-23 18:30:28 +00:00 |
|
Gregory Shtrasberg
|
0825197bee
|
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek (#27373)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-10-23 17:43:53 +00:00 |
|
Alexander Matveev
|
9ef3d5b875
|
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (#27220)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-10-24 00:03:14 +08:00 |
|
Alexei-V-Ivanov-AMD
|
295c7f0267
|
Mirroring the test definitions (2025-10-22) (#27362)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-10-24 00:02:26 +08:00 |
|
wang.yuqi
|
3fa2c12185
|
[Frontend][4/N] Improve all pooling task | Add plugin pooling task (#26973)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-10-23 14:46:18 +00:00 |
|
Cyrus Leung
|
fe2016de2d
|
[CI/Build] Remove unnecessary flags from test registry (#27353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-23 14:42:40 +00:00 |
|
Ilya Markov
|
237cf6d32a
|
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-10-23 20:58:39 +08:00 |
|
Navya Srivastava
|
faee3ccdc2
|
[Feature] Pydantic validation for speculative.py (#27156)
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-23 12:19:33 +00:00 |
|
Bradley D
|
570c3e1cd4
|
[Bugfix] Honor --mm_encoder_attn_backend when used (#27124)
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-23 20:09:52 +08:00 |
|
Harry Mellor
|
3a4255c7c4
|
Run mypy on the lowest supported Python version instead of system Python (#27048)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-23 05:07:44 -07:00 |
|
tomeras91
|
61089465a6
|
[Model] Add MoE support for NemotronH (#25863)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-10-23 10:27:23 +00:00 |
|
Tova Movshovitz
|
88afa11010
|
[Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245)
Signed-off-by: tovam <tovam@pliops.com>
|
2025-10-23 12:21:08 +02:00 |
|
Chauncey
|
d00ce29d89
|
[CI] Reorganize entrypoints tests (#27403)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-23 10:10:06 +00:00 |
|
Louie Tsai
|
3b7bdf983b
|
add SLA information into comparison graph for vLLM Benchmark Suite (#25525)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: louie-tsai <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-23 08:04:59 +00:00 |
|
Zhewen Li
|
50b788a17a
|
[CI/Build] Fix AMD CI: test_cpu_gpu.py (#27388)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-23 07:55:00 +00:00 |
|
Lucia Fang
|
fc059c7061
|
[Bugfix] Fix args settings for guided decoding args (#27375)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-10-23 07:34:06 +00:00 |
|
Cyrus Leung
|
bfb240cc49
|
[CI/Build] Fix Prithvi plugin test (#27393)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-23 07:30:44 +00:00 |
|
Jonathan Chen
|
e255d92990
|
[Chore] Remove duplicate has_ functions in vllm.utils (#27372)
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
|
2025-10-23 06:11:59 +00:00 |
|
wang.yuqi
|
3729ed00ba
|
[Model] Add num_cached_tokens for PoolingRequestOutput (#27378)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-23 14:03:42 +08:00 |
|
Giancarlo Delfin
|
6644796bf4
|
[V1][spec decode] return logprobs for spec decoding (#26060)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-22 22:59:59 -07:00 |
|
Andrew Sansom
|
ff93cc8c84
|
[CORE] Support Prefix Caching with Prompt Embeds (#27219)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-10-22 22:18:07 -07:00 |
|
PiteXChen
|
243ed7d32e
|
[Bugfix][Core] running queue index leakage exception (#26754)
Signed-off-by: CLFutureX <chenyongqyl@163.com>
|
2025-10-22 21:40:12 -07:00 |
|
fangpings
|
7e0941055f
|
[Bugfix] Fix incorrect kv cache metrics in grafana.json (#27133)
Signed-off-by: Fangping Shi <fangping_shi@apple.com>
Co-authored-by: Fangping Shi <fangping_shi@apple.com>
|
2025-10-22 20:58:36 -07:00 |
|
Cyrus Leung
|
6738e4a093
|
[Bugfix] Fix SLA tuner initialization (#27355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 20:43:04 -07:00 |
|
Isotr0py
|
2566dca2a9
|
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support (#27361)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-22 17:15:38 -07:00 |
|
Matthew Bonanni
|
b4fda58a2d
|
[MLA] Bump FlashMLA (#27354)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-22 15:48:37 -07:00 |
|
dongbo910220
|
a0003b56b0
|
[Chore] Separate out system utilities from vllm.utils (#27201)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 20:25:25 +00:00 |
|
Daisy-Ma-coder
|
5beacce2ea
|
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (#27128)
Signed-off-by: qqma <qqma@amazon.com>
Co-authored-by: qqma <qqma@amazon.com>
|
2025-10-22 19:36:39 +00:00 |
|
rongfu.leng
|
8669c69afa
|
[Feature] publisher default set zmq in kv_event config (#26915)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 19:19:33 +00:00 |
|
Sage
|
1651003c35
|
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2025-10-22 18:13:03 +00:00 |
|
William Song
|
1cb8c6c5fe
|
[Doc] Fix numbering sequence in prefix caching (#27357)
Signed-off-by: William Song <jinwook@umich.edu>
|
2025-10-22 17:35:47 +00:00 |
|
Luciano Martins
|
e05a6754a8
|
[Model] Revert PR #26715: Restore custom PaliGemma and Gemma3-MM impl… (#27309)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2025-10-22 10:05:34 -07:00 |
|
Isotr0py
|
084a9dae80
|
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models (#27344)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-22 16:39:08 +00:00 |
|
RED
|
c9461e05a4
|
Support Anthropic API /v1/messages Endpoint (#22627)
Signed-off-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-22 09:13:18 -07:00 |
|
Nicolò Lucchesi
|
4dfdb821c8
|
[P/D] Dynamic kv_output_aggregator collect size (#26734)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-22 18:07:58 +02:00 |
|
Russell Bryant
|
58fab50d82
|
[Frontend] Require flag for loading text and image embeds (#27204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 15:52:02 +00:00 |
|
Isotr0py
|
db6f28d898
|
[Bugfix] Fix HF format InternVL large variants video processing (#27330)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-22 08:39:23 -07:00 |
|
Cyrus Leung
|
14e2f1231e
|
[Bugfix] Make get_mrope_input_positions instance methods (#27342)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 08:38:34 -07:00 |
|
Chendi.Xue
|
7c4767f1eb
|
[NIXL] use Host buffer to support TP_ratio > 1 for XPU (#27140)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-10-22 15:28:13 +00:00 |
|
Jee Jee Li
|
9771e0b432
|
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA (#27351)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 08:19:12 -07:00 |
|
Reinforce-II
|
980de31ca0
|
[bugfix] remove unused parameters to reduce unnecessary vram usage (#26789)
Signed-off-by: Reinforce-II <fate@eastal.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-22 08:16:09 -07:00 |
|