jiahanc
254a7f8fd6
[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE ( #30014 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-12-16 13:01:48 -08:00
Christina Norman
166ac3c94d
fix(shm): Add memory barriers for cross-process shared memory visibility ( #30407 )
...
Signed-off-by: Christina Holland <hey@christinaholland.com>
Signed-off-by: Christina <truffle@gmail.com>
2025-12-10 23:01:19 +00:00
Lain
1fb632fdb6
[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum ( #29795 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
2025-12-08 15:02:34 -08:00
weiguihua2
398a596ed2
[MP executor] fix get device count for multi node of mp executor feature ( #30042 )
...
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-09 01:33:48 +08:00
knlnguyen1802
fc6acc88ca
[Bugfix] Missing cached item in the MultiModalReceiverCache ( #28525 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Chenguang Zheng <645327136@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-01 10:18:07 -08:00
Didier Durand
04a797cd0e
[Doc]: fixing typos in various files. ( #29717 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-29 01:15:39 -08:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference ( #27277 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 14:46:36 -08:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
elvischenv
6330f9477d
[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-11-25 07:59:40 +00:00
Lucia Fang
b316ac6589
[V1] Support MP Executor for multi node distributed inference ( #23691 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-11-16 09:01:21 +00:00
Nick Hill
bc3e43069a
[BugFix] Fix multi-modal async scheduling race condition ( #28706 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-14 01:11:13 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform ( #12695 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
2025-11-12 15:24:12 -08:00
Ilya Markov
1788aa1efb
[BugFix] Graceful handling of torch symm mem errors. ( #27671 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-11 17:41:54 -07:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
bnellnm
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
Wentao Ye
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-30 15:32:39 -04:00
Kero Liang
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: donglu <donglu@cohere.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-10-28 15:01:24 +00:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-24 19:27:04 -04:00
Pengchao Wang
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-10-24 14:16:44 -07:00
Rui Qiao
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-24 14:53:09 +08:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 20:25:25 +00:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 10:44:21 -04:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 11:09:37 -07:00
dongbo910220
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-10-21 06:18:23 -07:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
2025-10-19 03:06:32 -07:00
Nick Hill
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-18 09:52:45 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Nick Hill
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-16 20:08:03 -07:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-10-16 21:40:25 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-15 22:06:02 -07:00
Michael Goin
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-13 18:12:52 -07:00
Wentao Ye
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 16:02:24 -04:00
Wentao Ye
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-14 02:06:57 +08:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Wentao Ye
8983e0216f
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" ( #26448 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-09 15:16:48 -07:00
Utkarsh Sharma
335b28f7d1
[TPU] Rename tpu_commons to tpu_inference ( #26279 )
...
Signed-off-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-10-07 23:30:52 -07:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length ( #26262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-06 05:12:40 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Lucia Fang
f48b6a03ba
[Misc]allow disable pynccl ( #25421 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-10-01 06:04:13 +00:00
Gregory Shtrasberg
61a3431613
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-29 17:01:50 -04:00
Tyler Michael Smith
a5354b3ed2
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 14:22:28 +00:00
Nick Hill
983056e456
[Misc] Remove unnecessary memoryviews in shm_broadcast.py ( #25721 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-26 03:11:44 +00:00
Nick Hill
8b77328ffe
[Misc] Don't log shm dequeue delay warning on worker side ( #25720 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-26 01:08:30 +00:00
youkaichao
6c340da4df
[misc] log info messages by default for hanging / busy / idle ( #25627 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-25 21:14:57 +08:00
Cyrus Leung
2f17117606
[mypy] Fix wrong type annotations related to tuple ( #25660 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 13:00:45 +00:00
Shu Wang
54e42b72db
Support mnnvl all2allv from Flashinfer ( #21003 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-24 14:38:16 -04:00
youkaichao
b67dece2d8
[misc] update the warning message ( #25566 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-24 17:24:35 +08:00
Michael Goin
7361ab379f
Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-23 22:48:40 +00:00
Ilya Markov
8bdd8b5c51
Enable symmetric memory all reduce by default only enabling for TP ( #25070 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 15:53:00 -04:00
Amir Samani
8c1c81a3de
[core] add nccl symmetric memory for all reduce ( #24532 )
...
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 14:33:06 -04:00