Woosuk Kwon
|
98ef239486
|
minor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 23:55:46 +00:00 |
|
Woosuk Kwon
|
a66aa37f40
|
minor:
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 23:47:20 +00:00 |
|
Woosuk Kwon
|
6f038fc4fb
|
Merge branch 'main' into woosuk/model-runner-v2
|
2025-09-19 20:30:04 +00:00 |
|
Michael Goin
|
48ecb4438b
|
[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-19 14:06:49 -06:00 |
|
Harry Mellor
|
e57fc15971
|
Specify platform in pip-compile pre-commit hook so it runs on MacOS (#25273)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 12:43:33 -07:00 |
|
bnellnm
|
4bdf400218
|
[Bugfix] Fix chunked a2_scales in modular kernels (#25264)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-19 19:42:01 +00:00 |
|
Varun Sundar Rabindranath
|
7852b82b93
|
[Bugfix] GPT OSS Attritbute error on H100 (#25228)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-09-19 13:14:09 -06:00 |
|
Woosuk Kwon
|
010e39ec7d
|
minor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 19:07:46 +00:00 |
|
qizixi
|
a2a5f79e09
|
Optimize triton unified attention performance for sliding window attention (#24390)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-19 13:07:26 -06:00 |
|
Or Ozeri
|
c59a0eca42
|
[KV offload][4/N] Offloading KV connector (#22595)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 19:07:17 +00:00 |
|
Woosuk Kwon
|
396bbe67d3
|
Merge branch 'main' into woosuk/model-runner-v2
|
2025-09-19 18:53:18 +00:00 |
|
Lucia Fang
|
b716ab93a7
|
[bugfix] fix structured outputs key missing issue from #24929 (#25195)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-09-19 18:37:57 +00:00 |
|
samzong
|
138f0d1e75
|
[Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform (#24974)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-09-19 18:32:27 +00:00 |
|
Jialin Ouyang
|
2506ce5189
|
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-19 12:22:53 -06:00 |
|
Chauncey
|
47fd08aaf9
|
[CI/Build] fix test function_calling (#25072)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-19 12:16:32 -06:00 |
|
Harry Mellor
|
12aed7e453
|
Encoder model support for the Transformers backend (#25174)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 19:15:22 +01:00 |
|
LJH-LBJ
|
d90e212a3a
|
Remove Redundant Assignment in Qwen3_VisionPatchMerger (#25224)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-19 12:15:13 -06:00 |
|
Jee Jee Li
|
2821986450
|
[Core] Modify the initialization parameters of the lora manager (#25249)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-19 18:01:28 +00:00 |
|
Cyrus Leung
|
6c117cff7d
|
[Frontend] Pass API server count to each process (#23717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 01:15:19 +08:00 |
|
Or Ozeri
|
7ac67ea525
|
[KV offload][3/N] Add worker-side CPU support (#21448)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 09:53:45 -07:00 |
|
Woosuk Kwon
|
c7f3e84b34
|
minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-19 09:49:40 -07:00 |
|
samzong
|
ce75e15373
|
refactor(benchmarks): add type annotations to wait_for_endpoint parameters (#25218)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-09-19 16:36:52 +00:00 |
|
Harry Mellor
|
aed16879a9
|
Move ModelConfig from config/__init__.py to config/model.py (#25252)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 16:22:33 +00:00 |
|
Harry Mellor
|
cf278ff3b2
|
Update CODEOWNERS (#25269)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 09:12:55 -07:00 |
|
Woosuk Kwon
|
a8e7071924
|
minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-19 08:33:47 -07:00 |
|
Icey
|
838d7116ba
|
[Qwen] Remove cuda hard-code in qwen3 next (#25243)
Signed-off-by: Icey <1790571317@qq.com>
|
2025-09-19 12:25:12 +00:00 |
|
Cyrus Leung
|
5089fd749c
|
[V0 Deprecation] Remove V0 logic from get_input_embeddings interface (#25242)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-19 11:10:52 +00:00 |
|
Nicolò Lucchesi
|
a3d087adec
|
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-19 11:09:14 +00:00 |
|
Harry Mellor
|
058525b997
|
Move PoolerConfig from config/__init__.py to config/pooler.py (#25181)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 11:02:55 +00:00 |
|
Roger Wang
|
1dfea5f4a9
|
[Bugfix][Perf] Misc fixes for Qwen3 VL (#25238)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-19 10:46:16 +00:00 |
|
Isotr0py
|
cea91a32f2
|
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (#25055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-19 10:27:49 +00:00 |
|
Woosuk Kwon
|
4be2c66e37
|
fix
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 09:35:38 +00:00 |
|
Yan Ma
|
a684c0124c
|
[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-19 08:45:06 +00:00 |
|
Isotr0py
|
f2718d2948
|
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-19 07:44:56 +00:00 |
|
Li, Jiang
|
825fdb11ad
|
[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton (#25137)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-19 07:41:12 +00:00 |
|
Li, Jiang
|
8c1d4acbfe
|
[CPU] Disable oneDNN linear on non-x86 platforms (#25166)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-19 07:27:22 +00:00 |
|
Woosuk Kwon
|
d30c0d50a6
|
refactor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 07:17:53 +00:00 |
|
Woosuk Kwon
|
9c75d896a8
|
minor
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 07:11:37 +00:00 |
|
Woosuk Kwon
|
37478c18cf
|
async output
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 07:10:42 +00:00 |
|
Woosuk Kwon
|
33672774f5
|
Merge branch 'main' into woosuk/model-runner-v2
|
2025-09-19 06:52:46 +00:00 |
|
Woosuk Kwon
|
0d3de9e082
|
fix
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 06:50:56 +00:00 |
|
Woosuk Kwon
|
b405d78c07
|
DP sampler
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-19 06:46:46 +00:00 |
|
Russell Bryant
|
486c5599e3
|
[Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-19 14:27:17 +08:00 |
|
Chendi.Xue
|
a6149aa587
|
[OOT] Support sync_model_loading for OOT (#25126)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
|
2025-09-19 05:41:53 +00:00 |
|
Michael Yao
|
6c8a3c099b
|
[Docs] Fix griffe warnings in vllm/multimodal (#25216)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-18 22:10:44 -07:00 |
|
Roger Wang
|
31a8a2a7bc
|
[Misc] Clean up MM profiling warnings (#25222)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-19 04:46:57 +00:00 |
|
Chen Ding
|
1a0a04dae9
|
[Perf] Optimize memory peak during EAGLE model loading. (#24585)
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
|
2025-09-19 03:31:16 +00:00 |
|
Andrew Xia
|
6d8246aaff
|
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming (#24938)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-18 19:11:59 -07:00 |
|
Woosuk Kwon
|
8af87986aa
|
fix
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-18 18:37:30 -07:00 |
|
Woosuk Kwon
|
af65838d1f
|
dummy run
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-18 18:29:18 -07:00 |
|