6631 Commits

Author SHA1 Message Date
Sage Moore
252bf0809e debugging 2025-05-31 01:16:11 +00:00
Sage Moore
62da375465 more fixes 2025-05-30 21:17:06 +00:00
Sage Moore
5b0249b86e various fixes 2025-05-30 14:19:12 +00:00
Sage Moore
895a6c2a08 one a2a kernel per microbatch group 2025-05-30 04:06:39 +00:00
Sage Moore
5cc573e791 misc fixes 2025-05-29 00:09:25 +00:00
Lucas Wilkinson
f0b66d6929 prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:37:43 +00:00
Lucas Wilkinson
a743a35948 fixes
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:14:59 +00:00
Lucas Wilkinson
7b31e8a8ff wip seperate comm and compute threads
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 16:51:27 +00:00
Lucas Wilkinson
2f3920638c add comment
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 14:45:02 +00:00
Sage Moore
020d9b05bc fix dp=2 tp=2 hang
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-26 18:37:03 +00:00
Lucas Wilkinson
37bdf9f324 better logging
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:34:08 +00:00
Lucas Wilkinson
e4419df256 better debug utils
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:23:29 +00:00
Lucas Wilkinson
952f3c5c1e tone down prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:18:05 +00:00
Lucas Wilkinson
9edd08231b debugging hang
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 15:22:50 +00:00
Lucas Wilkinson
2dc3b8b0a2 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:32:25 +00:00
Lucas Wilkinson
18bf91e6a8 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:31:49 +00:00
Lucas Wilkinson
00f526f55b seperate gpu wait
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:52:27 +00:00
Lucas Wilkinson
a8439e2fd4 dp working no yields
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:49:14 +00:00
Sage Moore
2a7f25fbe2 fix hang 2025-05-22 20:51:36 +00:00
Lucas Wilkinson
9c60a6299d tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
2259b47951 use vllm current_stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
04f11d97a0 working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
ffb740ae95 manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Sage Moore
020269c4c5 added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
9ccfd094ff fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
f93bdd3151 support more args in dp example
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
df8f889f37 support MLA
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
37c9babaa0 enable naive microbatching
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Lucas Wilkinson
8293182c8c wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
Percy
980a172474
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-05-20 11:19:34 -07:00
Calvin Chen
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-20 09:40:05 -07:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
Reid
8f55962a7f
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 15:26:12 +00:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
2025-05-20 06:59:48 -07:00
wang.yuqi
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model (#17175) 2025-05-20 06:51:12 -07:00
汪志鹏
d6c86d09ae
Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-05-20 10:53:23 +00:00
Jee Jee Li
6b35cb10a0
[Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-20 03:27:30 -07:00
Reid
1b1e8e05ff
[doc] update env variable export (#18391)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 08:53:27 +00:00
Random Fly
bca55b556f
[Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
Signed-off-by: rand-fly <randfly@outlook.com>
2025-05-20 00:54:33 -07:00
Kevin H. Luu
d981396778
[release] Change dockerhub username for TPU release (#18389) 2025-05-19 23:49:23 -07:00
Nan Qin
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
2025-05-19 20:21:27 -07:00
Isotr0py
f07a673eb2
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-19 20:20:12 -07:00
Liangfu Chen
d565e0976f
[neuron] fix authorization issue (#18364)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
2025-05-19 23:30:32 +00:00
Lucia Fang
258bf621d5
fix CUDA_check redefinition in #17918 (#18287)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-05-19 13:42:35 -07:00
Satyajith Chilappagari
dc1440cf9f
Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
2025-05-19 09:54:47 -07:00
Gong Shufan
8171221834
[Misc] Fix typo (#18330) 2025-05-19 09:51:01 -07:00
sunyicode0012
7937c2fd52
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337) 2025-05-19 09:49:57 -07:00
Wenhua Cheng
e2ee1e8e9e
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
2025-05-19 09:38:53 -07:00
Reid
20d8ce81eb
[Frontend] add --quick option for vllm chat/complete (#18297)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-19 09:36:13 -07:00
Elad Segal
84ab4feb7e
[Doc] Fix typo (#18355) 2025-05-19 16:05:16 +00:00