Cyrus Leung
638e4196d1
[Misc] Make SchedulerConfig.max_model_len init-only ( #28733 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-15 01:59:31 -08:00
Cyrus Leung
511a6b611d
[Config] Clean up SchedulerConfig initialization ( #28665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-14 22:41:02 +08:00
Huamin Li
07a606aa7e
[CI Failure] Fix backend selection for encoder-only models ( #28534 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-13 10:11:27 -05:00
Fanli Lin
dbbe0c756a
[XPU] Support Triton path for LoRA operations on XPU ( #28511 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-11-13 05:31:42 +00:00
Harry Mellor
54aecd9ed5
Fix pre-commit (and XPU) on main ( #28556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 06:13:41 -08:00
wangxiyuan
10138c92a5
[V0 deprecation] Deprecate use_v1 parameter ( #28112 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-12 14:03:52 +00:00
Chaojun Zhang
a4730c1b4f
[XPU]Fix crash due to removed VLLM_USE_V1 attribute ( #28520 )
...
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
2025-11-12 10:20:55 +00:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
wangxiyuan
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module ( #27798 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-01 10:17:45 +00:00
Yan Ma
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yejing Lai <yejing.lai@intel.com>
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-01 04:45:02 +00:00
Chendi.Xue
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-10-22 15:28:13 +00:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-22 09:04:57 +00:00
Harry Mellor
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-17 02:47:34 -07:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 12:11:48 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 02:51:16 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
liuzhenwei
27ed39a347
[XPU] Upgrade NIXL to remove CUDA dependency ( #26570 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-10-11 05:15:23 +00:00
Nicolò Lucchesi
4ebc9108a7
[Kernel] Centralize platform kernel import in current_platform.import_kernels ( #26286 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-08 20:25:31 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Hank_
17edd8a807
[Platform][Kernel] platform-specific kernel loading ( #25823 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com>
2025-10-05 13:25:15 +02:00
Matthew Bonanni
2aaa423842
[Attention] Move Backend enum into registry ( #25893 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-02 20:32:24 -07:00
Yongye Zhu
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-30 17:14:41 +08:00
Matthew Bonanni
3468f17ebe
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-25 17:37:50 +00:00
Kunshang Ji
f225ea7dd9
[XPU] Fix compile_size is None case. ( #25433 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-23 03:09:00 +00:00
Isotr0py
6fa78d8f23
[V0 deprecation] Remove platform v1 controling interface ( #25410 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 01:48:12 +00:00
Yizhou
b6f01bd9a7
refactor: abstract graph mode support into platform interface ( #25161 )
...
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-09-22 10:22:29 +00:00
Woosuk Kwon
0ff8ebb2d7
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor ( #25334 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 08:52:32 -07:00
Kunshang Ji
5206ab20ba
[XPU] Fix circular import error. ( #24927 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-16 03:35:36 +00:00
Nicolò Lucchesi
2e41f5abca
[XPU] Set consistent default KV cache layout ( #24745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-15 18:09:34 +08:00
liuzhenwei
e599e2c65e
[XPU][P/D] Add XPU support in NixlConnector ( #22436 )
...
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-04 21:03:12 -07:00
Kunshang Ji
16ded21eeb
[XPU] support Triton Attention backend on Intel GPU ( #24149 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-04 20:41:08 +08:00
Chaojun Zhang
862f2ef893
[XPU] Fix the bug of LoRA logits on the XPU platform ( #24081 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-09-03 08:21:18 +08:00
Yan Ma
7be0cb8e9e
[XPU][Feature] fp8 online quantization support for XPU ( #23148 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
2025-09-02 04:06:53 +00:00
Kunshang Ji
fce10dbed5
[XPU] Add xpu torch.compile support ( #22609 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-27 05:33:27 +00:00
Chaojun Zhang
8a044754bd
[XPU] Delay BF16 check to worker init for spawn compatibility ( #22979 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-08-25 13:09:26 -07:00
Kunshang Ji
7caec10e7b
[XPU]avoid circular import during XPU init ( #23017 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-16 05:16:34 +00:00
fhl2000
74f441f4b5
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer ( #20059 )
...
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-08-15 10:01:39 -04:00
Yongye Zhu
007dd90859
[gpt-oss] Enable gpt-oss on ampere ( #22714 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-12 03:21:44 -07:00
Kunshang Ji
05cbbe20c5
[XPU] use ZE_AFFINITY_MASK for device select on xpu ( #21815 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-30 03:56:14 +00:00
Chaojun Zhang
ec261b0291
[XPU] IPEX-optimized Punica Wrapper on XPU ( #21703 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-28 16:43:37 +00:00
Nick Hill
ffbcc9e757
[BugFix] Fix VllmConfig() construction on all platforms ( #20695 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-10 07:00:20 +00:00
Liangliang Ma
a3e4e85ece
[XPU][CI] enhance xpu test support ( #20652 )
...
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
2025-07-09 16:53:09 +00:00
Kunshang Ji
0b407479ef
[misc]refactor Platform.set_device method ( #20262 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-09 01:39:47 +00:00
Yan Ma
a4c23314c0
[xpu]feat: support multi-lora on xpu ( #20616 )
...
Signed-off-by: yan <yan.ma@intel.com>
2025-07-08 22:07:10 +08:00
Yan Ma
3112271f6e
[XPU] log clean up for XPU platform ( #20553 )
...
Signed-off-by: yan <yan.ma@intel.com>
2025-07-07 01:38:22 -07:00
Liangliang Ma
2c5ebec064
[XPU][CI] add v1/core test in xpu hardware ci ( #20537 )
...
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-07 01:16:40 -07:00
Yang Yang
6e2c19ce22
[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU ( #19410 )
...
Signed-off-by: dbyoung18 <yang5.yang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-07 04:32:32 +00:00
Woosuk Kwon
e202dd2736
[V0 deprecation] Remove V0 CPU/XPU/TPU backends ( #20412 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-07-06 08:48:13 -07:00
Liangliang Ma
a0389e0554
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests ( #20169 )
...
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-02 09:06:04 +08:00
Kunshang Ji
b69781f107
[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. ( #19560 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-06-26 09:27:18 -07:00