Tyler Michael Smith
|
8ecb3e9e93
|
[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-08-03 22:19:04 -07:00 |
|
Chenxi Yang
|
e5949e5ae0
|
Remove index_put from MM embeddings merging (#22105)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
|
2025-08-03 22:15:14 -07:00 |
|
ZiTian.Zhao
|
49bcd893e7
|
[refactor] improve ConstantList exception specificity (#22156)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 22:14:49 -07:00 |
|
Giancarlo Delfin
|
aa7012eb6d
|
Add tree attention backend for v1 (part 1) (#20401)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-03 22:13:26 -07:00 |
|
Ning Xie
|
c2e75b3c11
|
remove duplicate code within cleanup_dist_env_and_memory (#22147)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-03 20:03:58 -07:00 |
|
Abirdcfly
|
0d7db16a92
|
[PD] add test for chat completions endpoint (#21925)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-03 19:57:03 -07:00 |
|
22quinn
|
845420ac2c
|
[RLHF] Fix torch.dtype not serializable in example (#22158)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 02:43:33 +00:00 |
|
ZiTian.Zhao
|
e27d25a0dc
|
[fix] fix correct assertion syntax error in attention utils. (#22154)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 19:24:02 -07:00 |
|
Seiji Eicher
|
6f5478298d
|
Use aiohttp connection pool for benchmarking (#21981)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-08-03 19:23:32 -07:00 |
|
Isotr0py
|
6a39ba85fe
|
[Bugfix] Fix failing multimodal standard test (#22153)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-03 19:04:38 +00:00 |
|
Yuxuan Zhang
|
d3c18c9cb0
|
fuse fp32 for GLM-4.5 e_score_correction_bias (#22143)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-03 09:04:54 -07:00 |
|
TankNee
|
83f7bbb318
|
Add chat doc in quick start (#21213)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-03 07:47:55 -07:00 |
|
Li, Jiang
|
b5dfb94fa0
|
[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-03 05:34:04 -07:00 |
|
Woosuk Kwon
|
6d98843b31
|
[Responses API] Disable response store by default (#22137)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-03 04:04:21 -07:00 |
|
David Ben-David
|
aefeea0fde
|
[V1] [P/D] Refactor KV Connector Path (#21980)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-08-03 04:03:40 -07:00 |
|
H
|
24d1dffbeb
|
[executor] feat: add supports_pp attr to executors (#21786)
Signed-off-by: Haibin Lin <haibin.lin@bytedance.com>
|
2025-08-03 18:04:45 +08:00 |
|
Ning Xie
|
7de45db9a5
|
[Misc] update doc comment for send (#22026)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-03 00:55:20 -07:00 |
|
Roberto L. Castro
|
789562c28c
|
Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) (#21309)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
|
2025-08-03 00:54:22 -07:00 |
|
Ye (Charlotte) Qi
|
3f36c325fa
|
[Benchmark] Support ready check timeout in vllm bench serve (#21696)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-03 00:52:38 -07:00 |
|
Isotr0py
|
3dddbf1f25
|
[Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-03 00:52:14 -07:00 |
|
jiahanc
|
337eb23bcc
|
[Fix] Fix llama4 modelopt weight loading error (#22107)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-03 00:50:34 -07:00 |
|
Rui Qiao
|
2ff46b8826
|
[Misc] Bump ray to 2.48.0 (#22123)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-02 19:42:00 -07:00 |
|
Xiao
|
554df8a6a2
|
Revert "[compile][startup] Disable C++ compilation of symbolic shapes" (#22122)
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
|
2025-08-02 09:03:30 -07:00 |
|
Yan Ma
|
73e1b9b1d4
|
[xpu]support moe models on XPU platform (#21643)
Signed-off-by: yan <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-08-02 07:49:08 -07:00 |
|
Thomas Parnell
|
4abfd8796f
|
[V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time (#21557)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-02 05:29:40 -07:00 |
|
Cyrus Leung
|
f5d0f4784f
|
[Frontend] Improve error message for too many mm items (#22114)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-02 02:20:38 -07:00 |
|
Chih-Chieh Yang
|
b690e34824
|
[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075)
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-08-02 01:59:34 -07:00 |
|
Yuxuan Zhang
|
25373b6c6c
|
for glm-4.1V update (#22000)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-08-02 01:46:57 -07:00 |
|
Vadim Gimpelson
|
58eee5f2e0
|
[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion (#20000)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-08-02 01:43:52 -07:00 |
|
Roger Wang
|
067c34a155
|
docs: remove deprecated disable-log-requests flag (#22113)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-02 00:19:48 -07:00 |
|
Chih-Chieh Yang
|
c64861d63c
|
[Bugfix] Mamba2 remove bugged initial state condition in chunk scan (#22034)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-08-01 23:55:57 -07:00 |
|
Yong Hoon Shin
|
8564dc9448
|
Fix test_kv_sharing_fast_prefill flakiness (#22038)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-01 23:55:34 -07:00 |
|
Rui Qiao
|
4ac8437352
|
[Misc] Getting and passing ray runtime_env to workers (#22040)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-01 23:54:40 -07:00 |
|
vllmellm
|
d3a6f2120b
|
[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069)
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
|
2025-08-01 23:53:18 -07:00 |
|
Sage Moore
|
0edaf752d7
|
[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-01 19:47:53 -07:00 |
|
Wentao Ye
|
6e8d8c4afb
|
[Test] Add Unit Test for Batched DeepGEMM (#21559)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-02 10:45:46 +08:00 |
|
Nick Hill
|
8d524ce79f
|
[BugFix] Improve internal DP load balancing (#21617)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 19:45:27 -07:00 |
|
Dipika Sikka
|
9f9c38c392
|
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2025-08-01 19:43:37 -07:00 |
|
Varun Sundar Rabindranath
|
a65f46be5e
|
[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-01 19:42:03 -07:00 |
|
Nicolò Lucchesi
|
57393715e8
|
[Misc] VLLM_TARGET_DEVICE.lower() (#22101)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-01 19:41:40 -07:00 |
|
vllmellm
|
ee2eb6ecd8
|
[Model] Qwen2.5 VL SiLU-and-Mul (#22066)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-08-01 19:34:37 -07:00 |
|
fhl2000
|
23322431c8
|
[V1][CUDA] Full cudagraph support for FlashInfer (#21367)
|
2025-08-01 21:49:34 -04:00 |
|
JartX
|
3654847db5
|
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733)
|
2025-08-01 21:12:19 -04:00 |
|
Wentao Ye
|
eefbf4a68b
|
[Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 19:18:51 -04:00 |
|
Michael Goin
|
88faa466d7
|
[CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 16:18:38 -07:00 |
|
Nick Hill
|
881e1af43a
|
[BugFix] Harden distributed DP startup (#21538)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 21:40:45 +00:00 |
|
XiongfeiWei
|
d84b97a3e3
|
Add lora test for tp>1 case for TPU. (#21970)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-08-01 18:56:08 +00:00 |
|
Rui Qiao
|
d331759488
|
Introduce RayPPCommunicator for ray-based PP (#21660)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-01 11:50:58 -07:00 |
|
Animesh Jain
|
9659bc7f27
|
[compile][startup] Disable C++ compilation of symbolic shapes (#20836)
Signed-off-by: Animesh Jain <anijain@umich.edu>
|
2025-08-01 10:38:52 -07:00 |
|
Michael Goin
|
3277e8f9e1
|
Fix pre-commit failure for SECURTIY.md (#22102)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 10:36:07 -07:00 |
|