Michael Goin
|
e31446b6c8
|
[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 13:48:25 -07:00 |
|
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Sage Moore
|
2e3484c237
|
debugging
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-03 19:25:01 +00:00 |
|
Reid
|
01eee40536
|
[doc] update docker version (#19074)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-03 19:08:21 +00:00 |
|
SorenDreano
|
19bdaf32b1
|
[Doc] Readme standardization (#18695)
Co-authored-by: Soren Dreano <soren@numind.ai>
|
2025-06-03 11:50:55 -07:00 |
|
Sage Moore
|
e080e068ed
|
fix pplx a2a
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-03 18:21:17 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
CYJiang
|
d054da1992
|
[Misc] fix: add miss best_of param validation (#18555)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-03 11:02:07 -07:00 |
|
Nicolò Lucchesi
|
4b7817c119
|
[Misc] Add missing _Backend enums (#19081)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-03 16:15:16 +00:00 |
|
Lu Fang
|
d00dd65cd4
|
[Doc] Improve the Pull Request template with key components (#19086)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-03 23:44:34 +08:00 |
|
Raushan Turganbay
|
d81edded69
|
[Bugfix] disable processor cache (#19068)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-06-03 15:06:04 +00:00 |
|
Harry Mellor
|
476844d44c
|
Fix underscores in dict keys passed via CLI (#19030)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-06-03 14:39:24 +00:00 |
|
Jee Jee Li
|
4e68ae5e59
|
[CI/Build] Remove V0 LoRA test (#19066)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-03 14:30:18 +00:00 |
|
youkaichao
|
4e88723f32
|
[doc] clarify windows support (#19088)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-03 21:42:17 +08:00 |
|
Cyrus Leung
|
118ff92111
|
[Doc] Update V1 user guide for embedding and enc-dec models (#19060)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-03 02:29:41 -07:00 |
|
Isotr0py
|
ec2dcd80bc
|
[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl (#19054)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-03 09:08:20 +00:00 |
|
Jee Jee Li
|
42243fbda0
|
[Doc] Add InternVL LoRA support (#19055)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-03 09:08:03 +00:00 |
|
Michael Goin
|
6d18ed2a2e
|
Update docker docs with ARM CUDA cross-compile (#19037)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-06-03 08:21:53 +00:00 |
|
Chen Zhang
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
|
Lu Fang
|
d32aa2e670
|
[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-03 00:16:17 -07:00 |
|
Michael Goin
|
cc977286e7
|
Reduce logs in CLI scripts and plugin loader (#18970)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 06:00:45 +00:00 |
|
Reid
|
17430e3653
|
[bugfix] small fix logic issue (#18999)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-03 05:35:12 +00:00 |
|
汪志鹏
|
1282bd812e
|
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-03 13:13:13 +08:00 |
|
Rui Qiao
|
bdce64f236
|
[V1] Support DP with Ray (#18779)
|
2025-06-02 21:15:13 -07:00 |
|
Gregory Shtrasberg
|
9e6f61e8c3
|
[ROCm][Build] Clean up the ROCm build (#19040)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-02 20:47:47 -07:00 |
|
Li, Jiang
|
8655f47f37
|
[CPU][CI] Re-enable the CPU CI tests (#19046)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-06-02 20:46:47 -07:00 |
|
Concurrensee
|
4ce42f9204
|
Adding "LoRA Test %N" to AMD production tests (#18929)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
|
2025-06-02 20:46:44 -07:00 |
|
Tyler Michael Smith
|
8a57872b2a
|
[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-03 11:36:51 +08:00 |
|
Sage Moore
|
5f4a501b9a
|
more fixes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-03 03:04:53 +00:00 |
|
Hyogeun Oh (오효근)
|
5bc1ad6cee
|
[Doc] Remove duplicate TOCs during MkDocs migration (#19021)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-06-02 19:49:48 -07:00 |
|
Sage Moore
|
539c0c3add
|
first round of fixes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-03 02:38:44 +00:00 |
|
Sage Moore
|
18e7d6c7b8
|
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
|
2025-06-03 00:52:39 +00:00 |
|
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
|
Calvin Chen
|
c57d577e8d
|
add an absolute path for run.sh (#18258)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-02 19:38:23 +00:00 |
|
Sage Moore
|
2731e8cbcb
|
temporarily remove enable_microbatching
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:30:01 +00:00 |
|
Sage Moore
|
919eef995b
|
temporarily remove enable_microbatching
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:28:58 +00:00 |
|
Sage Moore
|
e34e4411b9
|
fa format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:17:50 +00:00 |
|
Sage Moore
|
d46397661f
|
pplx format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:17:15 +00:00 |
|
Sage Moore
|
243eac58a4
|
forward context format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:16:06 +00:00 |
|
Sage Moore
|
8332924320
|
dp format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:15:23 +00:00 |
|
Sage Moore
|
d4b502a73a
|
mla format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:14:19 +00:00 |
|
Sage Moore
|
44a595f6d6
|
config format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:13:27 +00:00 |
|
Sage Moore
|
92e0cc79a8
|
format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 19:04:26 +00:00 |
|
Sage Moore
|
8ea80fca4a
|
revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:05:48 +00:00 |
|
Sage Moore
|
21d9529a79
|
revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:05:26 +00:00 |
|
Sage Moore
|
d6eca0c130
|
remove modular kernel
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:03:21 +00:00 |
|
Sage Moore
|
6645882e95
|
comment prepare input
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:02:23 +00:00 |
|
Sage Moore
|
065816d25f
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:01:24 +00:00 |
|
Sage Moore
|
90e46ee5e3
|
misc cleanups to prepare for rebase
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-02 18:00:56 +00:00 |
|