Woosuk Kwon
010e39ec7d
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 19:07:46 +00:00
Woosuk Kwon
396bbe67d3
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 18:53:18 +00:00
Lucia Fang
b716ab93a7
[bugfix] fix structured outputs key missing issue from #24929 ( #25195 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-09-19 18:37:57 +00:00
samzong
138f0d1e75
[Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform ( #24974 )
...
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-09-19 18:32:27 +00:00
Jialin Ouyang
2506ce5189
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance ( #24990 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-09-19 12:22:53 -06:00
Chauncey
47fd08aaf9
[CI/Build] fix test function_calling ( #25072 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-19 12:16:32 -06:00
Harry Mellor
12aed7e453
Encoder model support for the Transformers backend ( #25174 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 19:15:22 +01:00
LJH-LBJ
d90e212a3a
Remove Redundant Assignment in Qwen3_VisionPatchMerger ( #25224 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-19 12:15:13 -06:00
Jee Jee Li
2821986450
[Core] Modify the initialization parameters of the lora manager ( #25249 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-19 18:01:28 +00:00
Cyrus Leung
6c117cff7d
[Frontend] Pass API server count to each process ( #23717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 01:15:19 +08:00
Or Ozeri
7ac67ea525
[KV offload][3/N] Add worker-side CPU support ( #21448 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 09:53:45 -07:00
Woosuk Kwon
c7f3e84b34
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-19 09:49:40 -07:00
samzong
ce75e15373
refactor(benchmarks): add type annotations to wait_for_endpoint parameters ( #25218 )
...
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-09-19 16:36:52 +00:00
Harry Mellor
aed16879a9
Move ModelConfig from config/__init__.py to config/model.py ( #25252 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 16:22:33 +00:00
Harry Mellor
cf278ff3b2
Update CODEOWNERS ( #25269 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 09:12:55 -07:00
Woosuk Kwon
a8e7071924
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-19 08:33:47 -07:00
Icey
838d7116ba
[Qwen] Remove cuda hard-code in qwen3 next ( #25243 )
...
Signed-off-by: Icey <1790571317@qq.com>
2025-09-19 12:25:12 +00:00
Cyrus Leung
5089fd749c
[V0 Deprecation] Remove V0 logic from get_input_embeddings interface ( #25242 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-19 11:10:52 +00:00
Nicolò Lucchesi
a3d087adec
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy ( #22188 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-19 11:09:14 +00:00
Harry Mellor
058525b997
Move PoolerConfig from config/__init__.py to config/pooler.py ( #25181 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Roger Wang
1dfea5f4a9
[Bugfix][Perf] Misc fixes for Qwen3 VL ( #25238 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 10:46:16 +00:00
Isotr0py
cea91a32f2
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE ( #25055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 10:27:49 +00:00
Woosuk Kwon
4be2c66e37
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 09:35:38 +00:00
Yan Ma
a684c0124c
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B ( #25146 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 08:45:06 +00:00
Isotr0py
f2718d2948
[Misc] Cleanup test conftest for deprecated encoder-decoder models ( #25231 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 07:44:56 +00:00
Li, Jiang
825fdb11ad
[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton ( #25137 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:41:12 +00:00
Li, Jiang
8c1d4acbfe
[CPU] Disable oneDNN linear on non-x86 platforms ( #25166 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-19 07:27:22 +00:00
Woosuk Kwon
d30c0d50a6
refactor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:17:53 +00:00
Woosuk Kwon
9c75d896a8
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:11:37 +00:00
Woosuk Kwon
37478c18cf
async output
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 07:10:42 +00:00
Woosuk Kwon
33672774f5
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 06:52:46 +00:00
Woosuk Kwon
0d3de9e082
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:50:56 +00:00
Woosuk Kwon
b405d78c07
DP sampler
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-19 06:46:46 +00:00
Russell Bryant
486c5599e3
[Build] Update Xgrammar to 0.1.24 to get a CVE fix ( #25188 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-19 14:27:17 +08:00
Chendi.Xue
a6149aa587
[OOT] Support sync_model_loading for OOT ( #25126 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-19 05:41:53 +00:00
Michael Yao
6c8a3c099b
[Docs] Fix griffe warnings in vllm/multimodal ( #25216 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-18 22:10:44 -07:00
Roger Wang
31a8a2a7bc
[Misc] Clean up MM profiling warnings ( #25222 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-19 04:46:57 +00:00
Chen Ding
1a0a04dae9
[Perf] Optimize memory peak during EAGLE model loading. ( #24585 )
...
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
2025-09-19 03:31:16 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Woosuk Kwon
8af87986aa
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:37:30 -07:00
Woosuk Kwon
af65838d1f
dummy run
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 18:29:18 -07:00
Woosuk Kwon
52ca2f517a
sample
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:39:43 -07:00
Woosuk Kwon
8deedfa42b
-inf
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:24:00 -07:00
Woosuk Kwon
b9c74487d2
logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 17:23:02 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management ( #20075 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 00:20:51 +00:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
31619ff412
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:38:56 -07:00
Woosuk Kwon
d2be62378b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:33:18 -07:00
Woosuk Kwon
86dade710d
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:32:00 -07:00
Woosuk Kwon
efda08481b
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 16:31:01 -07:00