Sage Moore
|
243eac58a4
|
forward context format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:16:06 +00:00 |
|
Sage Moore
|
8332924320
|
dp format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:15:23 +00:00 |
|
Sage Moore
|
d4b502a73a
|
mla format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:14:19 +00:00 |
|
Sage Moore
|
44a595f6d6
|
config format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:13:27 +00:00 |
|
Sage Moore
|
92e0cc79a8
|
format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:04:26 +00:00 |
|
Sage Moore
|
8ea80fca4a
|
revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:05:48 +00:00 |
|
Sage Moore
|
21d9529a79
|
revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:05:26 +00:00 |
|
Sage Moore
|
d6eca0c130
|
remove modular kernel
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:03:21 +00:00 |
|
Sage Moore
|
6645882e95
|
comment prepare input
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:02:23 +00:00 |
|
Sage Moore
|
065816d25f
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:01:24 +00:00 |
|
Sage Moore
|
90e46ee5e3
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:00:56 +00:00 |
|
Sage Moore
|
8f592524cb
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 14:15:52 +00:00 |
|
Sage Moore
|
0323e29153
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 14:13:30 +00:00 |
|
Sage Moore
|
252bf0809e
|
debugging
|
2025-05-31 01:16:11 +00:00 |
|
Sage Moore
|
62da375465
|
more fixes
|
2025-05-30 21:17:06 +00:00 |
|
Sage Moore
|
5b0249b86e
|
various fixes
|
2025-05-30 14:19:12 +00:00 |
|
Sage Moore
|
895a6c2a08
|
one a2a kernel per microbatch group
|
2025-05-30 04:06:39 +00:00 |
|
Sage Moore
|
5cc573e791
|
misc fixes
|
2025-05-29 00:09:25 +00:00 |
|
Lucas Wilkinson
|
f0b66d6929
|
prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-27 18:37:43 +00:00 |
|
Lucas Wilkinson
|
a743a35948
|
fixes
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-27 18:14:59 +00:00 |
|
Lucas Wilkinson
|
7b31e8a8ff
|
wip seperate comm and compute threads
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-27 16:51:27 +00:00 |
|
Lucas Wilkinson
|
2f3920638c
|
add comment
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-27 14:45:02 +00:00 |
|
Sage Moore
|
020d9b05bc
|
fix dp=2 tp=2 hang
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-26 18:37:03 +00:00 |
|
Lucas Wilkinson
|
37bdf9f324
|
better logging
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 18:34:08 +00:00 |
|
Lucas Wilkinson
|
e4419df256
|
better debug utils
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 18:23:29 +00:00 |
|
Lucas Wilkinson
|
952f3c5c1e
|
tone down prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 18:18:05 +00:00 |
|
Lucas Wilkinson
|
9edd08231b
|
debugging hang
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 15:22:50 +00:00 |
|
Lucas Wilkinson
|
2dc3b8b0a2
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 03:32:25 +00:00 |
|
Lucas Wilkinson
|
18bf91e6a8
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 03:31:49 +00:00 |
|
Lucas Wilkinson
|
00f526f55b
|
seperate gpu wait
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 21:52:27 +00:00 |
|
Lucas Wilkinson
|
a8439e2fd4
|
dp working no yields
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 21:49:14 +00:00 |
|
Sage Moore
|
2a7f25fbe2
|
fix hang
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
9c60a6299d
|
tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
2259b47951
|
use vllm current_stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
04f11d97a0
|
working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
ffb740ae95
|
manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Sage Moore
|
020269c4c5
|
added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
9ccfd094ff
|
fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
f93bdd3151
|
support more args in dp example
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
df8f889f37
|
support MLA
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
37c9babaa0
|
enable naive microbatching
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
8293182c8c
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Percy
|
980a172474
|
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
|
2025-05-20 11:19:34 -07:00 |
|
Calvin Chen
|
e1f5a71ed7
|
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-20 09:40:05 -07:00 |
|
Michael Goin
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
Reid
|
8f55962a7f
|
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-20 15:26:12 +00:00 |
|
燃
|
be48360c1f
|
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
|
2025-05-20 06:59:48 -07:00 |
|
wang.yuqi
|
86847700d7
|
[CI] Add mteb testing to test the accuracy of the embedding model (#17175)
|
2025-05-20 06:51:12 -07:00 |
|
汪志鹏
|
d6c86d09ae
|
Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-05-20 10:53:23 +00:00 |
|
Jee Jee Li
|
6b35cb10a0
|
[Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-20 03:27:30 -07:00 |
|