6940 Commits

Author SHA1 Message Date
Sage Moore
e080e068ed fix pplx a2a
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 18:21:17 +00:00
Sage Moore
5f4a501b9a more fixes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 03:04:53 +00:00
Sage Moore
539c0c3add first round of fixes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 02:38:44 +00:00
Sage Moore
18e7d6c7b8 Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing 2025-06-03 00:52:39 +00:00
Sage Moore
2731e8cbcb temporarily remove enable_microbatching
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:30:01 +00:00
Sage Moore
919eef995b temporarily remove enable_microbatching
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:28:58 +00:00
Sage Moore
e34e4411b9 fa format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:17:50 +00:00
Sage Moore
d46397661f pplx format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:17:15 +00:00
Sage Moore
243eac58a4 forward context format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:16:06 +00:00
Sage Moore
8332924320 dp format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:15:23 +00:00
Sage Moore
d4b502a73a mla format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:14:19 +00:00
Sage Moore
44a595f6d6 config format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:13:27 +00:00
Sage Moore
92e0cc79a8 format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:04:26 +00:00
Sage Moore
8ea80fca4a revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:48 +00:00
Sage Moore
21d9529a79 revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:26 +00:00
Sage Moore
d6eca0c130 remove modular kernel
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:03:21 +00:00
Sage Moore
6645882e95 comment prepare input
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:02:23 +00:00
Sage Moore
065816d25f misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:01:24 +00:00
Sage Moore
90e46ee5e3 misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:00:56 +00:00
Gregory Shtrasberg
ca2f6b9c30
[Bugfix][Model] Attempt to fix eagle in V0. (#18978)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-06-02 08:15:53 -07:00
Frαnçois
20133cfee2
[Frontend] enable custom logging for the uvicorn server (OpenAI API server) (#18403)
Signed-off-by: François Paupier <francois.paupier@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-06-02 15:04:23 +00:00
Sage Moore
8f592524cb misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:15:52 +00:00
Sage Moore
0323e29153 misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 14:13:30 +00:00
jennyyyyzhen
ebb1ec9318
[Model] enable data parallel for Llama4 vision encoder (#18368)
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
2025-06-02 19:22:54 +08:00
Reid
5b168b6d7a
[doc] add pytest tips (#19010)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-02 11:07:26 +00:00
22quinn
9760fd8f6a
[Core] Support inplace model weights loading (#18745)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-02 17:38:50 +08:00
Robert Shaw
b9f61e1387
[Bugfix][Nixl] Fix DP Metadata Handshake (#19008)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-02 03:30:41 +00:00
zhrrr
d6fd3a33b8
[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context (#18935)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-06-01 19:41:18 +00:00
Reid
432ec9926e
[doc] wrong output (#19000)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-01 11:26:14 +00:00
Nick Hill
2b102d51ad
[BugFix] Fix incorrect metrics shutdown error log message (#18992)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-01 11:42:23 +08:00
rongfu.leng
aa54a7bf7b
[BugFix] fix data parallel construct ipv6 url addres (#18991)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-06-01 11:42:10 +08:00
Michael Goin
2ad6194a02
Let max_num_batched_tokens use human_readable_int for large numbers (#18968)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-01 11:41:29 +08:00
Reid
c594cbf565
[doc] small fix - mkdocs (#18996)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-31 20:23:43 -07:00
Isotr0py
a35ca765a5
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-01 11:06:57 +08:00
Cyrus Leung
6aa8f9a4e7
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-01 11:04:23 +08:00
Benjamin Chislett
1bc86a3da1
[Bugfix] Fix EAGLE3 broken logits (#18909)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-05-31 19:58:07 -07:00
Ekagra Ranjan
bbfa0c61d1
[Misc][Benchmark] Add support for CustomDataset (#18511) 2025-05-31 19:07:38 +00:00
Reid
20079c6e36
[Misc] add return token strs for tokenize (#18941)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-31 18:00:11 +00:00
Nick Hill
9a1b9b99d7
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-05-31 08:34:52 -07:00
ptarasiewiczNV
8bf507d766
[P/D] NixlConnector use cache device index for memory registration (#18969)
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
2025-05-31 11:19:18 -04:00
Charlie Fu
306d60401d
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-31 07:40:05 -07:00
Fred Reiss
f2c3f66d59
[Bugfix] Fix for issue 17396 (#18773)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
2025-05-31 11:58:17 +00:00
vllmellm
0f5e0d567e
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-31 03:39:31 -07:00
Luka Govedič
c55d804672
[BugFix] Pydantic part 2 (#18911)
Signed-off-by: luka <luka@neuralmagic.com>
2025-05-31 03:39:28 -07:00
Reid
749f5bdd38
[doc] fix the list rendering issue - security.md (#18982)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-31 10:39:21 +00:00
Satyajith Chilappagari
2a50ef5760
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-31 10:39:11 +00:00
Lucia Fang
b8b904795d
fix security issue of logging llm output (#18980)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-05-31 10:38:56 +00:00
Chauncey
ba5111f237
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-05-31 09:20:54 +00:00
Yong Hoon Shin
1e123529d7
[Misc] Fix estimated max model len msg (#18966)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-05-31 16:43:44 +08:00
Pooya Davoodi
dff80b0e42
[Frontend] Add rerank support to run_batch endpoint (#16278)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-05-31 07:40:01 +00:00