Harry Mellor
cf278ff3b2
Update CODEOWNERS ( #25269 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 09:12:55 -07:00
Woosuk Kwon
a8e7071924
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-19 08:33:47 -07:00
Icey
838d7116ba
[Qwen] Remove cuda hard-code in qwen3 next ( #25243 )
...
Signed-off-by: Icey <1790571317@qq.com>
2025-09-19 12:25:12 +00:00
Cyrus Leung
5089fd749c
[V0 Deprecation] Remove V0 logic from get_input_embeddings interface ( #25242 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-19 11:10:52 +00:00
Nicolò Lucchesi
a3d087adec
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy ( #22188 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-19 11:09:14 +00:00
Harry Mellor
058525b997
Move PoolerConfig from config/__init__.py to config/pooler.py ( #25181 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Roger Wang
1dfea5f4a9
[Bugfix][Perf] Misc fixes for Qwen3 VL ( #25238 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 10:46:16 +00:00
Isotr0py
cea91a32f2
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE ( #25055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 10:27:49 +00:00
Woosuk Kwon
4be2c66e37
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 09:35:38 +00:00
Yan Ma
a684c0124c
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B ( #25146 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 08:45:06 +00:00
Isotr0py
f2718d2948
[Misc] Cleanup test conftest for deprecated encoder-decoder models ( #25231 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 07:44:56 +00:00
Li, Jiang
825fdb11ad
[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton ( #25137 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:41:12 +00:00
Li, Jiang
8c1d4acbfe
[CPU] Disable oneDNN linear on non-x86 platforms ( #25166 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:27:22 +00:00
Woosuk Kwon
d30c0d50a6
refactor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:17:53 +00:00
Woosuk Kwon
9c75d896a8
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:11:37 +00:00
Woosuk Kwon
37478c18cf
async output
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:10:42 +00:00
Woosuk Kwon
33672774f5
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 06:52:46 +00:00
Woosuk Kwon
0d3de9e082
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:50:56 +00:00
Woosuk Kwon
b405d78c07
DP sampler
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:46:46 +00:00
Russell Bryant
486c5599e3
[Build] Update Xgrammar to 0.1.24 to get a CVE fix ( #25188 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-19 14:27:17 +08:00
Chendi.Xue
a6149aa587
[OOT] Support sync_model_loading for OOT ( #25126 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-19 05:41:53 +00:00
Michael Yao
6c8a3c099b
[Docs] Fix griffe warnings in vllm/multimodal ( #25216 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-18 22:10:44 -07:00
Roger Wang
31a8a2a7bc
[Misc] Clean up MM profiling warnings ( #25222 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 04:46:57 +00:00
Chen Ding
1a0a04dae9
[Perf] Optimize memory peak during EAGLE model loading. ( #24585 )
...
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
2025-09-19 03:31:16 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Woosuk Kwon
8af87986aa
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:37:30 -07:00
Woosuk Kwon
af65838d1f
dummy run
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:29:18 -07:00
Woosuk Kwon
52ca2f517a
sample
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:39:43 -07:00
Woosuk Kwon
8deedfa42b
-inf
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:24:00 -07:00
Woosuk Kwon
b9c74487d2
logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:23:02 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management ( #20075 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 00:20:51 +00:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
31619ff412
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:38:56 -07:00
Woosuk Kwon
d2be62378b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:33:18 -07:00
Woosuk Kwon
86dade710d
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:32:00 -07:00
Woosuk Kwon
efda08481b
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:31:01 -07:00
Woosuk Kwon
82da219ff9
Implement topk_logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:29:38 -07:00
Woosuk Kwon
323a05b3c5
update
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:51:36 -07:00
Woosuk Kwon
a98eff0762
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:21:30 -07:00
Woosuk Kwon
67d8c0c21b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:15:31 -07:00
Woosuk Kwon
2bb2cb13f4
revert
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:54:19 -07:00
Woosuk Kwon
e171e5bb67
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:53:32 -07:00
Woosuk Kwon
8407fa02ed
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:52:23 -07:00
Woosuk Kwon
82e591f7eb
remove
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:35:25 -07:00
Woosuk Kwon
330058f9b8
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:30:29 -07:00
Lucas Wilkinson
9fac6aa30b
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv ( #25206 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-18 14:26:28 -07:00
Woosuk Kwon
aabfaa08cf
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:14:03 -07:00
Or Ozeri
a53ad626d6
[KV offload][1b/N] rename offloading to kv_offload ( #25191 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-18 20:53:52 +00:00
Woosuk Kwon
bc6463ac97
hash
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:49:52 -07:00
Woosuk Kwon
1c3dad22ff
[V0 Deprecation] Remove unused async_timeout.py ( #25190 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 20:35:21 +00:00