Sage Moore
bfa828f399
format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 17:13:49 +00:00
Sage Moore
1a0e7110dd
_prepare_inputs cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 13:02:21 +00:00
Sage Moore
82ae694de6
comments cleanup etc
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 20:47:39 +00:00
Sage Moore
10ca263058
split some of the ubatching logic out of _run_model
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 20:26:56 +00:00
Sage Moore
908e9f8f54
cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 19:52:41 +00:00
Sage Moore
06cc133a63
cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:51:08 +00:00
Sage Moore
3a41a3dcff
cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:23:30 +00:00
Sage Moore
bb0645c644
separate ubatch and normal runs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:07:58 +00:00
Sage Moore
510e839429
more cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 16:35:52 +00:00
Sage Moore
f7b6e600b8
gpu_model_runner cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 16:23:11 +00:00
Sage Moore
0056be26f6
less ARs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:33:53 +00:00
Sage Moore
7cc5a549ad
cleanup some of the should_ubatch logic
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:22:53 +00:00
Sage Moore
1d75a029a9
remove cudagraph logic from flashmla.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:41:49 +00:00
Sage Moore
18f7bfb501
ubatching fix
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:22:41 +00:00
Sage Moore
0e499c4f4d
first round of cleanups
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
Sage Moore
c0efbbb5de
misc changes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
Lucas Wilkinson
f7a3ee0ea1
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
Sage Moore
57d404bbb8
misc
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:37:58 +00:00
Sage Moore
d833982e48
random push
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-30 17:08:51 +00:00
Woosuk Kwon
2863befce3
[Optimization] Use Shared CachedRequestData Instance Across All Requests ( #20232 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Woosuk Kwon
2062c0723d
[Spec Decode] Refactor spec decoding into a separate function ( #20238 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:13:50 -07:00
Woosuk Kwon
19108ef311
[Misc] Fix import ( #20233 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-29 20:34:54 -07:00
Sage Moore
4672c72f44
capture works replay does not
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-28 19:14:48 +00:00
Bowen Wang
e9fd658a73
[Feature] Expert Parallelism Load Balancer (EPLB) ( #18343 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-06-26 15:30:21 -07:00
Sage Moore
af68574e3d
reintegrate full cudagraphs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-26 03:57:48 +00:00
Sage Moore
78228a67ce
refactor a bunch of misc parameters into a UbatchMetadata class
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-26 00:14:18 +00:00
Sage Moore
54deb61b87
delete any notion of dummy_ubatch
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-25 23:48:16 +00:00
Sage Moore
0e2b4bd546
more refactoring
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-25 23:43:49 +00:00
Sage Moore
e2ba707d64
factored out some of the context creation code along with misc commeted infra
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-25 23:16:59 +00:00
Sage Moore
44a2b3494e
add attention splitting to dummy runs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-25 21:39:33 +00:00
Sage Moore
144b148de2
initial full cudagraphs support. normal runs are working. ubatching does not
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-25 19:14:31 +00:00
Sage Moore
96c0c4ea66
added initial code for cuda graph capturing ubatches
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-24 22:19:24 +00:00
Sage Moore
a4def24c2c
setup deepepll for ubatching
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-24 21:20:49 +00:00
Vadim Gimpelson
9a3b88328f
[PERF] Speedup of MRoPE prepare inputs ( #19939 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-06-23 23:01:26 -07:00
Isotr0py
61f4fc5dc6
[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 ( #19956 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-23 18:38:06 +00:00
Vlad Tiberiu Mihailescu
2e3e3c86dc
Export NaNs in logits to scheduler_stats if output is corrupted ( #18777 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2025-06-20 22:47:16 +08:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 ( #16188 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Richard Zou
ed33349738
[BugFix] Fix use_cudagraph=False ( #19612 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-19 08:23:12 +08:00
Chen Zhang
a89209b78d
[v1] Support mamba2 ( #19327 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-18 20:34:15 +00:00
Sage Moore
0889f66297
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
2025-06-18 13:56:24 +00:00
Sage Moore
1d112d90a5
misc changes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-17 13:34:46 +00:00
Luka Govedič
3597b06a4f
[CUDA] Enable full cudagraph for FlashMLA ( #18581 )
...
Signed-off-by: luka <luka@neuralmagic.com>
2025-06-13 18:12:26 +00:00
汪志鹏
cefdb9962d
[Fix] The zip function in Python 3.9 does not have the strict argument ( #19549 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-13 14:57:48 +08:00
Russell Bryant
c57bb199b3
[V1] Resolve failed concurrent structured output requests ( #19565 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-12 23:30:09 +00:00
Sage Moore
b74c731342
more hacking
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-12 20:36:13 +00:00
Sage Moore
d682f5e1bd
wip cudagraphs
2025-06-12 14:33:21 +00:00
Robert Shaw
97a9465bbc
[UX] Add Feedback During CUDAGraph Capture ( #19501 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-11 21:09:05 +00:00
Lukas Geiger
319cb1e351
[Core] Batch multi modal input using pinned memory ( #19169 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-10 13:44:59 +08:00
Varun Sundar Rabindranath
5cf2daea9a
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ( #19298 )
...
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
2025-06-09 10:50:39 -04:00
Yinghai Lu
770e5dcdb8
[full_graph] Fix query_start_loc padding ( #19321 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
2025-06-09 21:32:56 +08:00