9434 Commits

Author SHA1 Message Date
Harry Mellor
abc7989adc
[Docs] Remove Neuron install doc as backend no longer exists (#24396)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-13 00:15:03 -07:00
Hyogeun Oh (오효근)
9a8966bcc2
[Docs] Fix warnings in mkdocs build (continued) (#24791)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-13 00:13:44 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Jee Jee Li
99bfef841f
[Bugfix] Fix GPUModelRunner has no attribute lora_manager (#24762)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 23:55:14 -07:00
Shane A
89e08d6d18
[Model] Add Olmo3 model implementation (#24534)
Signed-off-by: Shane A <shanea@allenai.org>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-13 03:26:21 +00:00
Chenheli Hua
7f2ea7074e
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. (#23950)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-13 02:16:06 +00:00
Nick Hill
4fdd6f5cbf
[Core] Support async scheduling with uniproc executor (#24219)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-12 16:34:28 -07:00
Tao He
8226dd56bf
[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) (#24667)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-12 22:31:32 +00:00
Matthew Bonanni
5fe643fc26
Add FLASHINFER_MLA to backend selector test (#24753)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 22:30:07 +00:00
Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Alexandre Marques
c89ed8de43
Invert pattern order to make sure that out_proj layers are identified (#24781)
Signed-off-by: Alexandre Marques <almarque@redhat.com>
2025-09-12 14:45:29 -07:00
Wentao Ye
3beadc2f25
[Compilation Bug] Fix Inductor Graph Output with Shape Issue (#24772)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-12 21:23:05 +00:00
Clayton Coleman
bc636f21a6
[Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints (#23937)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
2025-09-12 13:57:53 -07:00
Zhewen Li
017354c0ef
[CI] Trigger BC Linter when labels are added/removed (#24767) 2025-09-12 11:44:36 -07:00
Cyrus Leung
010acc6e1e
[Bugfix] Fix incompatibility between #20452 and #24548 (#24754)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-12 11:17:29 -07:00
afeldman-nm
c8c42597ab
[CI] Speed up model unit tests in CI (#24253)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-12 10:36:50 -07:00
Michael Goin
9d2a44606d
[UX] Remove AsyncLLM torch profiler disabled log (#24609)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-12 10:08:44 -07:00
Samit
f17c075884
[Model] Switch to Fused RMSNorm in GLM-4.1V model (#24733)
Signed-off-by: SamitHuang <285365963@qq.com>
2025-09-12 09:12:23 -07:00
Lukas Geiger
b0d1213ac3
[Models] Prevent CUDA sync in Qwen2.5-VL (#24741)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 16:03:55 +00:00
Lukas Geiger
57f94e88ea
[Models] Optimise and simplify _validate_and_reshape_mm_tensor (#24742)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 15:37:37 +00:00
Kebe
684b6870e1
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation (#24626)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-09-12 08:01:24 -07:00
dongluw
a5b84f1cbf
[Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
2025-09-12 07:54:17 -07:00
Elvir Crnčević
9f04d9d55f
[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-12 07:54:04 -07:00
Yan Ma
4d7c1d531b
[Bugfix] Fix MRoPE dispatch on XPU (#24724)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-12 21:43:56 +08:00
Hyogeun Oh (오효근)
41f17bf290
[Docs] Fix warnings in mkdocs build (continued) (#24740)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-12 06:43:15 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files (#24726)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Flora Feng
0377802c20
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-12 21:42:23 +08:00
Wenlong Wang
72fc8aa412
[Multi Modal] Add FA3 in VIT (#24347)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-12 21:27:24 +08:00
youkaichao
fdb09c77d6
[sleep mode] save memory for on-the-fly quantization (#24731)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-12 11:25:19 +00:00
Ignacio Sica
7a1c4025f1
[Kernel] [CPU] refactor cpu_attn.py:_run_sdpa_forward for better memory access (#24701)
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
2025-09-12 19:23:07 +08:00
Jee Jee Li
60a0951924
[Bugfix] Fix BNB name match (#24735)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 11:12:01 +00:00
Chen Zhang
64d90c3e4f
[Misc][gpt-oss] Add gpt-oss label to PRs that mention harmony or related to builtin tool call (#24717)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-12 18:57:07 +08:00
Li, Jiang
59d5d2c736
[CI/Build] Skip prompt embeddings tests on V1-only CPU backend (#24721)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 18:51:01 +08:00
wang.yuqi
d21a36f5f9
[CI] Add ci_envs for convenient local testing (#24630)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-12 08:52:25 +00:00
Chen Zhang
561a0baee0
[CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order (#24640)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-12 07:49:09 +00:00
Nick Hill
f592b3174b
[BugFix] Fix Qwen3-Next PP (#24709)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-11 23:35:04 -07:00
Li, Jiang
7920de0a2a
[Bugfix] Fix MRoPE dispatch on CPU (#24712)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 04:56:31 +00:00
Andrew Sansom
ddcec289c7
Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds (#24686)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-12 04:35:48 +00:00
Maximilien de Bayser
e090b7b45b
Enable conversion of multimodal models to pooling tasks (#24451)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-09-12 03:30:41 +00:00
Gregory Shtrasberg
6a50eaa0d3
[DOCs] Update ROCm installation docs section (#24691)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-11 20:02:53 -07:00
Jee Jee Li
12a8414d81
[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 10:06:26 +08:00
Tao He
880c741bb6
[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases (#24680)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
v0.10.2rc2
2025-09-11 18:16:43 -07:00
RichardoMu
40b6c9122b
[V1] feat:add engine v1 tracing (#20372)
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-09-11 17:10:39 -07:00
Lucas Wilkinson
2e6bc46821
[Startup] Make DeepGEMM warmup scale with max-num-batched-tokens (#24693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-11 20:10:19 -04:00
Wentao Ye
fcba05c435
[Bug] Fix Layer weight_block_size Assertion Issue (#24674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 19:47:59 -04:00
Zazzle516
7a30fa8708
[Doc] Clarify cudagraph capture size logic and default behavior in scheduler (#18698)
Signed-off-by: Zazzle516 <2405677060@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 23:18:09 +00:00
Chen Zhang
f82f7a8990
[Qwen3-Next] MOE configs for H100 TP4 (#24699)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-11 15:45:52 -07:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Matthew Bonanni
d4fd2768ef
[Bugfix][Attention] Fix FlashInfer MLA block size logic (#24692)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-11 22:39:42 +00:00
Vadim Gimpelson
7a70a71892
[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-11 15:34:58 -07:00