Woosuk Kwon
9c75d896a8
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:11:37 +00:00
Woosuk Kwon
37478c18cf
async output
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:10:42 +00:00
Woosuk Kwon
33672774f5
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 06:52:46 +00:00
Woosuk Kwon
0d3de9e082
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:50:56 +00:00
Woosuk Kwon
b405d78c07
DP sampler
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:46:46 +00:00
Russell Bryant
486c5599e3
[Build] Update Xgrammar to 0.1.24 to get a CVE fix ( #25188 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-19 14:27:17 +08:00
Chendi.Xue
a6149aa587
[OOT] Support sync_model_loading for OOT ( #25126 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-19 05:41:53 +00:00
Michael Yao
6c8a3c099b
[Docs] Fix griffe warnings in vllm/multimodal ( #25216 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-18 22:10:44 -07:00
Roger Wang
31a8a2a7bc
[Misc] Clean up MM profiling warnings ( #25222 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 04:46:57 +00:00
Chen Ding
1a0a04dae9
[Perf] Optimize memory peak during EAGLE model loading. ( #24585 )
...
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
2025-09-19 03:31:16 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Woosuk Kwon
8af87986aa
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:37:30 -07:00
Woosuk Kwon
af65838d1f
dummy run
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:29:18 -07:00
Woosuk Kwon
52ca2f517a
sample
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:39:43 -07:00
Woosuk Kwon
8deedfa42b
-inf
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:24:00 -07:00
Woosuk Kwon
b9c74487d2
logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:23:02 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management ( #20075 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 00:20:51 +00:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
31619ff412
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:38:56 -07:00
Woosuk Kwon
d2be62378b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:33:18 -07:00
Woosuk Kwon
86dade710d
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:32:00 -07:00
Woosuk Kwon
efda08481b
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:31:01 -07:00
Woosuk Kwon
82da219ff9
Implement topk_logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:29:38 -07:00
Woosuk Kwon
323a05b3c5
update
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:51:36 -07:00
Woosuk Kwon
a98eff0762
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:21:30 -07:00
Woosuk Kwon
67d8c0c21b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 15:15:31 -07:00
Woosuk Kwon
2bb2cb13f4
revert
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:54:19 -07:00
Woosuk Kwon
e171e5bb67
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:53:32 -07:00
Woosuk Kwon
8407fa02ed
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:52:23 -07:00
Woosuk Kwon
82e591f7eb
remove
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:35:25 -07:00
Woosuk Kwon
330058f9b8
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:30:29 -07:00
Lucas Wilkinson
9fac6aa30b
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv ( #25206 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-18 14:26:28 -07:00
Woosuk Kwon
aabfaa08cf
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 14:14:03 -07:00
Or Ozeri
a53ad626d6
[KV offload][1b/N] rename offloading to kv_offload ( #25191 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-18 20:53:52 +00:00
Woosuk Kwon
bc6463ac97
hash
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:49:52 -07:00
Woosuk Kwon
1c3dad22ff
[V0 Deprecation] Remove unused async_timeout.py ( #25190 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 20:35:21 +00:00
Woosuk Kwon
a4962833f9
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:20:37 -07:00
Woosuk Kwon
3f50030cc8
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:11:46 -07:00
Woosuk Kwon
cbdb47dc01
working
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 13:10:35 -07:00
Woosuk Kwon
92f337faeb
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 12:44:21 -07:00
Wentao Ye
d2a30a2d93
[Bug] Fix torch Compilation Cache Hit Error ( #25093 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-18 12:38:37 -07:00
Woosuk Kwon
9050087250
update
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 12:37:29 -07:00
Wentao Ye
75fb112d80
[Bug] Fix returned_lse not Defined issue ( #25106 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-18 19:32:24 +00:00
Aziz
38db529f66
[feat]: Create interface for model-specific M-RoPE ( #24194 )
...
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-18 19:18:56 +00:00
Woosuk Kwon
c1d83f2bae
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 12:13:56 -07:00
Nikhil Gupta
064cac7bb7
[fix]: remove data type hardcoding from gptoss model implementation ( #23807 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-09-18 18:15:23 +00:00
Woosuk Kwon
e19bce40a1
[V0 Deprecation] Remove AsyncLLMEngine ( #25025 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 11:07:42 -07:00
Or Ozeri
505805b645
[KV offload][1/N] Introduce an offloading component ( #19848 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-18 10:57:07 -07:00
Rohan Potdar
bbdc0f2366
[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation ( #25104 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-09-18 17:46:47 +00:00
Gregory Shtrasberg
dc34059360
[ROCm][CI/Build] Use ROCm7.0 as the base ( #25178 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-18 09:36:55 -07:00