DefTruth
|
f90d34b498
|
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-03-23 01:10:10 -07:00 |
|
youkaichao
|
f68cce8e64
|
[ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 14:49:48 +08:00 |
|
youkaichao
|
09b6a95551
|
[ci/build] update torch nightly version for GH200 (#15135)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 14:04:13 +08:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
hijkzzz
|
0661cfef7a
|
Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 10:23:35 +08:00 |
|
Chen Zhang
|
a827aa815d
|
[doc] Add back previous news (#15331)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-22 17:38:33 -07:00 |
|
Russell Bryant
|
b877031d80
|
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 14:06:39 -07:00 |
|
Wang Ran (汪然)
|
dd861b992f
|
[BugFix][Typing] Fix Imprecise Type Annotations (#15208)
Signed-off-by: Wang Ran (汪然) <wrran@outlook.com>
|
2025-03-22 09:05:03 -07:00 |
|
Russell Bryant
|
eb63ea1e18
|
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 15:56:17 +00:00 |
|
Naitong Yu
|
2f4bd358f1
|
[Model] Support Tele-FLM Model (#15023)
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
|
2025-03-22 02:04:44 -07:00 |
|
Varun Sundar Rabindranath
|
8a8b30eac1
|
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes (#15308)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-22 02:03:32 -07:00 |
|
Jee Jee Li
|
2fa0e1396b
|
[Bugfix] Fix torch.compile raise FileNotFoundError (#15278)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-22 13:49:34 +08:00 |
|
wwl2755
|
1c2bec0f82
|
[Doc] add load_format items in docs (#14804)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-21 22:36:43 -07:00 |
|
TJian
|
ec870fba9a
|
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-21 22:36:14 -07:00 |
|
Andy Lo
|
df1430265c
|
[Bugfix][V0] Multi-sequence logprobs streaming edge case (#15259)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-03-21 22:35:37 -07:00 |
|
Rui Qiao
|
4c69e228b3
|
[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout (#15301)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-03-21 22:25:43 -07:00 |
|
Russell Bryant
|
790b79750b
|
[Build/CI] Fix env var typo (#15305)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-21 22:28:46 +00:00 |
|
Nicolò Lucchesi
|
cfbb8c930f
|
[TPU][V1] MHA Pallas backend (#15288)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-21 08:50:39 -07:00 |
|
Cyrus Leung
|
baec0d4de9
|
Revert "[Feature] specify model in config.yaml (#14855)" (#15293)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-21 08:30:23 -07:00 |
|
Mengqing Cao
|
c21b99b912
|
[Bugfix][VLM] fix llava processor (#15285)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-03-21 05:14:36 -07:00 |
|
Chen Zhang
|
93a00d7dde
|
[v1] Refactor KVCacheConfig (#14079)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-21 04:56:27 -07:00 |
|
Russell Bryant
|
61e8c18350
|
[Misc] Add cProfile helpers (#15074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-21 04:56:09 -07:00 |
|
Isotr0py
|
8afcd0f633
|
[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend (#15282)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-21 11:42:06 +00:00 |
|
Lehua Ding
|
91ca929dc7
|
[V1] Fix wrong import path of get_flash_attn_version (#15280)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
|
2025-03-21 03:54:11 -07:00 |
|
Isotr0py
|
84e00adc8a
|
[Bugfix] Fix incorrect resolving order for transformers fallback (#15279)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-21 03:54:08 -07:00 |
|
Isotr0py
|
47c7126213
|
[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL (#15273)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-21 10:32:33 +00:00 |
|
Shanshan Shen
|
a989ca2bf6
|
[Bugfix] Add int8 torch dtype for KVCache (#15260)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-03-21 08:58:28 +00:00 |
|
Wei Zeng
|
0fa3970deb
|
[Feature] specify model in config.yaml (#14855)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-03-21 00:26:03 -07:00 |
|
Nick Hill
|
da6ea29f7a
|
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 22:24:10 -07:00 |
|
Edwin Hernandez
|
7297941b38
|
[Doc] Update LWS docs (#15163)
Signed-off-by: Edwinhr716 <Edandres249@gmail.com>
|
2025-03-20 21:18:47 -07:00 |
|
Isotr0py
|
f8a08cb90d
|
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-21 03:14:19 +00:00 |
|
Siyuan Liu
|
b15fd2be2a
|
[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-21 03:05:28 +00:00 |
|
Woosuk Kwon
|
e588ac237c
|
Add an example for reproducibility (#15262)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 19:55:47 -07:00 |
|
Cody Yu
|
5df2da5b97
|
[Misc] Better RayExecutor and multiprocessing compatibility (#14705)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-20 19:27:46 -07:00 |
|
Woosuk Kwon
|
11b986b3fb
|
[Docs] Trim the latest news in README (#15261)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 19:24:21 -07:00 |
|
Chih-Chieh Yang
|
296f927f24
|
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14857)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-03-20 19:21:08 -07:00 |
|
Travis Johnson
|
0032903a5b
|
[Bugfix] detect alibi and revert to FA2 (#15231)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-03-20 19:20:16 -07:00 |
|
Hyesoo Yang
|
47195057e9
|
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-03-20 19:19:40 -07:00 |
|
Harry Mellor
|
6edbfa924d
|
Mention extra_body as a way top pass vLLM only parameters using the OpenAI client (#15240)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 19:18:36 -07:00 |
|
Isotr0py
|
1e508343e1
|
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 19:18:04 -07:00 |
|
Sage Moore
|
2e0b4cfde0
|
[ROCM] Upgrade torch to 2.6 (#15244)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-20 19:17:33 -07:00 |
|
Jee Jee Li
|
10f55fe6c5
|
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-20 19:17:12 -07:00 |
|
Lu Fang
|
d3ccbd6350
|
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-21 10:01:11 +08:00 |
|
Varun Sundar Rabindranath
|
0cfe7d386d
|
[CI/Build] LoRA : make add_lora_test safer (#15181)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-21 09:28:53 +08:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Yu Chin Fabian Lim
|
06dd08256f
|
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. (#14617)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-03-21 00:44:37 +00:00 |
|
Woosuk Kwon
|
2b22290ce0
|
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 15:24:16 -07:00 |
|
Jason
|
d8e82bc06d
|
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
|
2025-03-20 10:01:02 -07:00 |
|
Chi Zhang
|
086b56824c
|
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 (#15172)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-21 00:30:04 +08:00 |
|
Harry Mellor
|
5a0905ba2a
|
Replace misc issues with link to forum (#15226)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 23:18:20 +08:00 |
|