Isotr0py
045b396d09
[Bugfix][CI/Build] Fix failing Mteb CI ( #26638 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-12 02:42:42 -07:00
wang.yuqi
76852017ea
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank ( #25867 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 09:29:08 +00:00
Vadim Gimpelson
82e64c7a20
[PERF] [Qwen3-next] Speed up gated RMSNorm ( #26207 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-12 08:27:50 +00:00
wang.yuqi
4ca204055e
Add @noooop to codeowner for pooling models ( #26652 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 14:04:44 +08:00
Haisheng Chen
c5c8f5ea59
[EPLB] Support ernie4.5-moe ( #22100 )
...
Signed-off-by: Haisheng Chen <langzs335@outlook.com>
Signed-off-by: Haisheng Chen <60504847+HsChen-sys@users.noreply.github.com>
Signed-off-by: Haisheng Chen <hac048@ucsd.edu>
Co-authored-by: Haisheng Chen <langzs335@outlook.com>
2025-10-12 10:40:47 +08:00
Angela Yi
01653a917b
[compile] Fix inductor partition config ( #26645 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 21:03:14 +00:00
Huamin Li
0cd103e7cb
CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding ( #26509 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-11 20:50:57 +00:00
Cyrus Leung
5be7ca1b99
[Benchmark] Support Infinity API ( #26641 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-12 01:45:32 +08:00
Jee Jee Li
f0a30a067b
[Bugfix] Fix qwen-moe packed_modules_mapping ( #26634 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-11 15:21:33 +00:00
JJJYmmm
9d6cff3ede
[Bugfix][Qwen3VL] fix deepstack in qwen3vl ( #26626 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
2025-10-11 05:58:33 -07:00
Angela Yi
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter ( #26604 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 05:44:43 -07:00
Chauncey
d0bed837ac
[Refactor]Reduce duplicate code in serving_chat ( #26627 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 12:04:49 +00:00
muzian666
f7ee69868a
[CPU] fix the issue when the node is '-' cause json decode error. ( #26562 )
...
Signed-off-by: muzian666 <andylee_2001@163.com>
Co-authored-by: qingan.li <qingan.li@wizpresso.com>
2025-10-11 12:04:04 +00:00
Rahul Tuli
d2a71530c1
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE ( #26485 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-11 10:14:41 +00:00
ihb2032
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure ( #26401 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-11 09:12:16 +00:00
dsinghvi
727144bed1
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py ( #24172 )
...
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
2025-10-11 07:21:04 +00:00
sangho.lee
55392bc879
[Bugfix][Multi Modal] Fix incorrect Molmo image processing ( #26563 )
...
Signed-off-by: sanghol <sanghol@allenai.org>
2025-10-10 22:28:23 -07:00
Roger Wang
ddaff2938e
[MM] Move Qwen3Omni MRoPE impl to model file ( #26608 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 22:17:24 -07:00
liuzhenwei
27ed39a347
[XPU] Upgrade NIXL to remove CUDA dependency ( #26570 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-10-11 05:15:23 +00:00
Nishidha Panpaliya
8f8474fbe3
[CI/Build] Fix ppc64le CPU build and tests ( #22443 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-10-11 13:04:42 +08:00
Chauncey
be067861c6
[Frontend] Improve the performance of is_reasoning_end ( #25735 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 10:43:39 +08:00
Nick Hill
5bc26c438d
[BugFix] Make penalties and bad_words work with async scheduling ( #26467 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 23:27:04 +00:00
Zhengxu Chen
eef921f45e
AOT Compilation for torch.compile (Bundled) ( #24274 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-10-10 19:02:11 -04:00
Bram Wasti
e317414ce1
Cache the environment variable check for batch invariance ( #26510 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-10-10 22:47:34 +00:00
Nick Hill
949cb0170d
[BugFix] Fix async scheduling + request preemption ( #26385 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 20:29:57 +00:00
Vadim Gimpelson
e94cfd51da
[BUG] Qwen3-next MTP. Fix attn metadata build bug ( #26564 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-10-10 14:59:03 -04:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 ( #26596 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 18:21:25 +00:00
Will Eaton
3b780a4bbb
Update CUDA architecture list in build pipeline for 12.9.1 wheels ( #26592 )
...
Signed-off-by: Will Eaton <wseaton@users.noreply.github.com>
2025-10-10 11:15:27 -07:00
Harry Mellor
30f78af147
Update pre-commit hook versions ( #26591 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 17:03:44 +00:00
Xiong Wang
19a9b169bf
Add Qwen3-Omni moe thinker ( #25550 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-10 17:00:56 +00:00
Roberto L. Castro
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM ( #24440 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-10 09:43:40 -07:00
Shane A
8d2b8c0ff2
[Model] Add FlexOlmo model implementation ( #24923 )
...
Signed-off-by: Shane A <shanea@allenai.org>
2025-10-10 09:43:15 -07:00
Lukas Geiger
b2155ed317
[Model][Qwen3VL] Compute cu_seqlens on CPU to remove ( #26496 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 09:42:17 -07:00
Chauncey
910abdbd08
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended ( #26470 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 00:41:17 +08:00
baonudesifeizhai
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 ( #25845 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 16:35:28 +00:00
Mark McLoughlin
e519281920
[Metrics] Add test for multi-modal cache stats logging ( #26588 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 16:00:50 +00:00
Elvir Crnčević
7b03584de8
Silu v2 ( #25074 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: elvircrn <elvircrn@gmail.com>
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
2025-10-10 15:19:53 +00:00
Sage Moore
ae9d0e7da5
[Bugfix] Make DP padding optional in coordinate_batch_across_dp ( #26375 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-10-10 10:53:33 -04:00
Daniel Cámpora
0e67102d93
Added test_top_k_per_row to test-pipeline.yaml. ( #26569 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-10-10 10:48:33 -04:00
Jason Li
f4ba2061cf
[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 ( #26038 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: <>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 07:42:13 -07:00
Chauncey
1e6848a65d
[CI] fix test_run_batch.py::test_completions - AssertionError ( #26578 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 22:16:28 +08:00
Andy Lo
67661375fa
[BugFix] Fix noop elimination edge case ( #26394 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-10-10 13:33:04 +00:00
Lucas Kabela
213b64452a
[Bugfix] Convert untraceable GroupShape to list for AMD impl ( #26535 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-10-10 13:32:29 +00:00
Mark McLoughlin
784c231151
[NIXL] Ignore abort on already-finished request ( #25067 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 12:21:56 +02:00
Chen Zhang
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching ( #26296 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-10 03:02:49 -07:00
Chauncey
720d3cd0f0
[CI] fix ruff format ( #26579 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 03:02:12 -07:00
Ashwin Phadke
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar
3ee202ea1e
[GPT-OSS] Add support for arrays at tool message content ( #25593 )
...
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>
2025-10-10 09:00:45 +00:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset ( #26285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-10 01:45:55 -07:00
Chen Zhang
6f0f570c43
[deepseek] kernel block size for UniformTypeKVCacheSpecs ( #26559 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-10 16:40:41 +08:00