6644 Commits

Author SHA1 Message Date
Sage Moore
243eac58a4 forward context format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:16:06 +00:00
Sage Moore
8332924320 dp format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:15:23 +00:00
Sage Moore
d4b502a73a mla format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:14:19 +00:00
Sage Moore
44a595f6d6 config format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:13:27 +00:00
Sage Moore
92e0cc79a8 format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:04:26 +00:00
Sage Moore
8ea80fca4a revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:48 +00:00
Sage Moore
21d9529a79 revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:26 +00:00
Sage Moore
d6eca0c130 remove modular kernel
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:03:21 +00:00
Sage Moore
6645882e95 comment prepare input
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:02:23 +00:00
Sage Moore
065816d25f misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:01:24 +00:00
Sage Moore
90e46ee5e3 misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:00:56 +00:00
Sage Moore
8f592524cb misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:15:52 +00:00
Sage Moore
0323e29153 misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:13:30 +00:00
Sage Moore
252bf0809e debugging 2025-05-31 01:16:11 +00:00
Sage Moore
62da375465 more fixes 2025-05-30 21:17:06 +00:00
Sage Moore
5b0249b86e various fixes 2025-05-30 14:19:12 +00:00
Sage Moore
895a6c2a08 one a2a kernel per microbatch group 2025-05-30 04:06:39 +00:00
Sage Moore
5cc573e791 misc fixes 2025-05-29 00:09:25 +00:00
Lucas Wilkinson
f0b66d6929 prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:37:43 +00:00
Lucas Wilkinson
a743a35948 fixes
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:14:59 +00:00
Lucas Wilkinson
7b31e8a8ff wip seperate comm and compute threads
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 16:51:27 +00:00
Lucas Wilkinson
2f3920638c add comment
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 14:45:02 +00:00
Sage Moore
020d9b05bc fix dp=2 tp=2 hang
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-26 18:37:03 +00:00
Lucas Wilkinson
37bdf9f324 better logging
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:34:08 +00:00
Lucas Wilkinson
e4419df256 better debug utils
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:23:29 +00:00
Lucas Wilkinson
952f3c5c1e tone down prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:18:05 +00:00
Lucas Wilkinson
9edd08231b debugging hang
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 15:22:50 +00:00
Lucas Wilkinson
2dc3b8b0a2 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:32:25 +00:00
Lucas Wilkinson
18bf91e6a8 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:31:49 +00:00
Lucas Wilkinson
00f526f55b seperate gpu wait
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:52:27 +00:00
Lucas Wilkinson
a8439e2fd4 dp working no yields
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:49:14 +00:00
Sage Moore
2a7f25fbe2 fix hang 2025-05-22 20:51:36 +00:00
Lucas Wilkinson
9c60a6299d tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
2259b47951 use vllm current_stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
04f11d97a0 working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
ffb740ae95 manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Sage Moore
020269c4c5 added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
9ccfd094ff fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
f93bdd3151 support more args in dp example
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
df8f889f37 support MLA
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
37c9babaa0 enable naive microbatching
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
8293182c8c wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Percy
980a172474
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-05-20 11:19:34 -07:00
Calvin Chen
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-20 09:40:05 -07:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
Reid
8f55962a7f
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 15:26:12 +00:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
2025-05-20 06:59:48 -07:00
wang.yuqi
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model (#17175) 2025-05-20 06:51:12 -07:00
汪志鹏
d6c86d09ae
Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-05-20 10:53:23 +00:00
Jee Jee Li
6b35cb10a0
[Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-20 03:27:30 -07:00