Ning Xie
499b074bfd
[Misc] refactor code by import as for torch._inductor.config ( #23677 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-09-01 14:05:42 +08:00
Ning Xie
e80bca309e
[Refactor] refactor freezing_value/cuda_event initialize outside try finally ( #23758 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-30 06:42:25 -07:00
Li, Jiang
ad39106b16
[CPU] Enable data parallel for CPU backend ( #23903 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-29 02:19:58 -07:00
Didier Durand
d3da2eea54
[Doc]: fix typos in Python scripts ( #23828 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-28 05:37:38 -07:00
Woosuk Kwon
04ff1e43fb
[Misc] Move CpuGpuBuffer to vllm/v1/utils.py ( #23728 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-27 03:25:00 -07:00
Li, Jiang
9b0187003e
[Bugfix] Fix cuda event usage with CPU model runner ( #23643 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-26 17:10:42 +00:00
Ning Xie
87f48623a5
[Misc] method name typo fix ( #23042 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-16 21:49:14 -07:00
Lucas Wilkinson
1dc8a70b6d
[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix ( #21588 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-06 18:40:52 -07:00
Thomas Parnell
4abfd8796f
[V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time ( #21557 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-02 05:29:40 -07:00
Maximilien de Bayser
1cd6eaba54
Support encoder-only models without KV-Cache ( #21270 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-07-26 21:09:52 +08:00
Li, Jiang
e3a0e43d7f
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code ( #21032 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-19 05:13:55 -07:00
Rui Qiao
217937221b
Elastic Expert Parallel Initial Support ( #20775 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-18 17:46:09 -07:00
Cyrus Leung
45badd05d0
[Core] Set pooling params based on task and model ( #21128 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-18 05:41:17 -07:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Li, Jiang
53da4cd397
[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 ( #20014 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-24 13:20:04 +00:00
Li, Jiang
6458721108
[CPU] Refine default config for the CPU backend ( #19539 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-13 13:27:39 +08:00
Li, Jiang
4555143ea7
[CPU] V1 support for the CPU backend ( #16441 )
2025-06-03 18:43:01 -07:00