9794 Commits

Author SHA1 Message Date
Harry Mellor
cf278ff3b2
Update CODEOWNERS (#25269)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 09:12:55 -07:00
Woosuk Kwon
a8e7071924 minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-19 08:33:47 -07:00
Icey
838d7116ba
[Qwen] Remove cuda hard-code in qwen3 next (#25243)
Signed-off-by: Icey <1790571317@qq.com>
2025-09-19 12:25:12 +00:00
Cyrus Leung
5089fd749c
[V0 Deprecation] Remove V0 logic from get_input_embeddings interface (#25242)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-19 11:10:52 +00:00
Nicolò Lucchesi
a3d087adec
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-19 11:09:14 +00:00
Harry Mellor
058525b997
Move PoolerConfig from config/__init__.py to config/pooler.py (#25181)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Roger Wang
1dfea5f4a9
[Bugfix][Perf] Misc fixes for Qwen3 VL (#25238)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 10:46:16 +00:00
Isotr0py
cea91a32f2
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (#25055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 10:27:49 +00:00
Woosuk Kwon
4be2c66e37 fix
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 09:35:38 +00:00
Yan Ma
a684c0124c
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 08:45:06 +00:00
Isotr0py
f2718d2948
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 07:44:56 +00:00
Li, Jiang
825fdb11ad
[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton (#25137)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:41:12 +00:00
Li, Jiang
8c1d4acbfe
[CPU] Disable oneDNN linear on non-x86 platforms (#25166)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:27:22 +00:00
Woosuk Kwon
d30c0d50a6 refactor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:17:53 +00:00
Woosuk Kwon
9c75d896a8 minor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:11:37 +00:00
Woosuk Kwon
37478c18cf async output
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:10:42 +00:00
Woosuk Kwon
33672774f5 Merge branch 'main' into woosuk/model-runner-v2 2025-09-19 06:52:46 +00:00
Woosuk Kwon
0d3de9e082 fix
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:50:56 +00:00
Woosuk Kwon
b405d78c07 DP sampler
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:46:46 +00:00
Russell Bryant
486c5599e3
[Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-19 14:27:17 +08:00
Chendi.Xue
a6149aa587
[OOT] Support sync_model_loading for OOT (#25126)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-19 05:41:53 +00:00
Michael Yao
6c8a3c099b
[Docs] Fix griffe warnings in vllm/multimodal (#25216)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-18 22:10:44 -07:00
Roger Wang
31a8a2a7bc
[Misc] Clean up MM profiling warnings (#25222)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 04:46:57 +00:00
Chen Ding
1a0a04dae9
[Perf] Optimize memory peak during EAGLE model loading. (#24585)
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
2025-09-19 03:31:16 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming (#24938)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Woosuk Kwon
8af87986aa fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:37:30 -07:00
Woosuk Kwon
af65838d1f dummy run
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:29:18 -07:00
Woosuk Kwon
52ca2f517a sample
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:39:43 -07:00
Woosuk Kwon
8deedfa42b -inf
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:24:00 -07:00
Woosuk Kwon
b9c74487d2 logprobs
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:23:02 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 00:20:51 +00:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine (#24278)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
31619ff412 fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:38:56 -07:00
Woosuk Kwon
d2be62378b fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:33:18 -07:00
Woosuk Kwon
86dade710d fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:32:00 -07:00
Woosuk Kwon
efda08481b minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:31:01 -07:00
Woosuk Kwon
82da219ff9 Implement topk_logprobs
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:29:38 -07:00
Woosuk Kwon
323a05b3c5 update
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:51:36 -07:00
Woosuk Kwon
a98eff0762 minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:21:30 -07:00
Woosuk Kwon
67d8c0c21b fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:15:31 -07:00
Woosuk Kwon
2bb2cb13f4 revert
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:54:19 -07:00
Woosuk Kwon
e171e5bb67 merge
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:53:32 -07:00
Woosuk Kwon
8407fa02ed fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:52:23 -07:00
Woosuk Kwon
82e591f7eb remove
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:35:25 -07:00
Woosuk Kwon
330058f9b8 fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:30:29 -07:00
Lucas Wilkinson
9fac6aa30b
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-18 14:26:28 -07:00
Woosuk Kwon
aabfaa08cf fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:14:03 -07:00
Or Ozeri
a53ad626d6
[KV offload][1b/N] rename offloading to kv_offload (#25191)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-18 20:53:52 +00:00
Woosuk Kwon
bc6463ac97 hash
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:49:52 -07:00
Woosuk Kwon
1c3dad22ff
[V0 Deprecation] Remove unused async_timeout.py (#25190)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 20:35:21 +00:00