Sage Moore
05ddc34913
misc padding fixes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-06 23:24:51 +00:00
Sage Moore
a00dabcb33
more padding work. still gets the wrong answer
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-06 14:09:44 +00:00
Sage Moore
a8675b7d98
ubatch padding should work now
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-05 14:25:21 +00:00
Sage Moore
8a75b3a1e5
added support for ubatch padding. not working
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-05 00:33:26 +00:00
Sage Moore
f8848bb201
misc fixes. lm_eval still gets a wrong answer but it no longer hangs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-04 22:46:18 +00:00
Sage Moore
2e3484c237
debugging
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 19:25:01 +00:00
Sage Moore
e080e068ed
fix pplx a2a
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 18:21:17 +00:00
Sage Moore
5f4a501b9a
more fixes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 03:04:53 +00:00
Sage Moore
539c0c3add
first round of fixes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 02:38:44 +00:00
Sage Moore
18e7d6c7b8
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
2025-06-03 00:52:39 +00:00
Sage Moore
2731e8cbcb
temporarily remove enable_microbatching
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:30:01 +00:00
Sage Moore
919eef995b
temporarily remove enable_microbatching
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:28:58 +00:00
Sage Moore
e34e4411b9
fa format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:17:50 +00:00
Sage Moore
d46397661f
pplx format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:17:15 +00:00
Sage Moore
243eac58a4
forward context format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:16:06 +00:00
Sage Moore
8332924320
dp format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:15:23 +00:00
Sage Moore
d4b502a73a
mla format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:14:19 +00:00
Sage Moore
44a595f6d6
config format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:13:27 +00:00
Sage Moore
92e0cc79a8
format
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:04:26 +00:00
Sage Moore
8ea80fca4a
revert offline_inference/basic.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:48 +00:00
Sage Moore
21d9529a79
revert offline_inference/basic.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:26 +00:00
Sage Moore
d6eca0c130
remove modular kernel
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:03:21 +00:00
Sage Moore
6645882e95
comment prepare input
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:02:23 +00:00
Sage Moore
065816d25f
misc cleanups to prepare for rebase
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:01:24 +00:00
Sage Moore
90e46ee5e3
misc cleanups to prepare for rebase
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:00:56 +00:00
Gregory Shtrasberg
ca2f6b9c30
[Bugfix][Model] Attempt to fix eagle in V0. ( #18978 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-06-02 08:15:53 -07:00
Frαnçois
20133cfee2
[Frontend] enable custom logging for the uvicorn server (OpenAI API server) ( #18403 )
...
Signed-off-by: François Paupier <francois.paupier@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-06-02 15:04:23 +00:00
Sage Moore
8f592524cb
misc cleanups to prepare for rebase
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:15:52 +00:00
Sage Moore
0323e29153
misc cleanups to prepare for rebase
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:13:30 +00:00
jennyyyyzhen
ebb1ec9318
[Model] enable data parallel for Llama4 vision encoder ( #18368 )
...
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
2025-06-02 19:22:54 +08:00
Reid
5b168b6d7a
[doc] add pytest tips ( #19010 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-02 11:07:26 +00:00
22quinn
9760fd8f6a
[Core] Support inplace model weights loading ( #18745 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-02 17:38:50 +08:00
Robert Shaw
b9f61e1387
[Bugfix][Nixl] Fix DP Metadata Handshake ( #19008 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-02 03:30:41 +00:00
zhrrr
d6fd3a33b8
[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context ( #18935 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-06-01 19:41:18 +00:00
Reid
432ec9926e
[doc] wrong output ( #19000 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-01 11:26:14 +00:00
Nick Hill
2b102d51ad
[BugFix] Fix incorrect metrics shutdown error log message ( #18992 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-01 11:42:23 +08:00
rongfu.leng
aa54a7bf7b
[BugFix] fix data parallel construct ipv6 url addres ( #18991 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-06-01 11:42:10 +08:00
Michael Goin
2ad6194a02
Let max_num_batched_tokens use human_readable_int for large numbers ( #18968 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-01 11:41:29 +08:00
Reid
c594cbf565
[doc] small fix - mkdocs ( #18996 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-31 20:23:43 -07:00
Isotr0py
a35ca765a5
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components ( #18987 )
...
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-01 11:06:57 +08:00
Cyrus Leung
6aa8f9a4e7
[Core] Rework dtype resolution ( #18751 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-01 11:04:23 +08:00
Benjamin Chislett
1bc86a3da1
[Bugfix] Fix EAGLE3 broken logits ( #18909 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-05-31 19:58:07 -07:00
Ekagra Ranjan
bbfa0c61d1
[Misc][Benchmark] Add support for CustomDataset ( #18511 )
2025-05-31 19:07:38 +00:00
Reid
20079c6e36
[Misc] add return token strs for tokenize ( #18941 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-31 18:00:11 +00:00
Nick Hill
9a1b9b99d7
[BugFix] Fix multi-node offline data-parallel ( #18981 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-05-31 08:34:52 -07:00
ptarasiewiczNV
8bf507d766
[P/D] NixlConnector use cache device index for memory registration ( #18969 )
...
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
2025-05-31 11:19:18 -04:00
Charlie Fu
306d60401d
[ROCm][Kernel] Add gfx950 support for skinny gemms ( #18010 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-31 07:40:05 -07:00
Fred Reiss
f2c3f66d59
[Bugfix] Fix for issue 17396 ( #18773 )
...
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
2025-05-31 11:58:17 +00:00
vllmellm
0f5e0d567e
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 ( #18825 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-31 03:39:31 -07:00
Luka Govedič
c55d804672
[BugFix] Pydantic part 2 ( #18911 )
...
Signed-off-by: luka <luka@neuralmagic.com>
2025-05-31 03:39:28 -07:00