Brittany
759b87ef3e
[TPU] Add an optimization doc on TPU ( #21155 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 07:23:19 -07:00
Harry Mellor
f693b067a2
[Docs] Merge design docs for a V1 only future ( #21832 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 07:22:50 -07:00
Richard Zou
04e38500ee
[Bugfix] VLLM_V1 supports passing other compilation levels ( #19340 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-29 09:35:58 -04:00
Cyrus Leung
ab714131e4
[Doc] Update compatibility matrix for pooling and multimodal models ( #21831 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 06:29:51 -07:00
Chen Zhang
755fa8b657
[KVCache] Make KVCacheSpec hashable ( #21791 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-07-29 19:58:29 +08:00
Kay Yan
2470419119
[Docs] Fix the outdated URL for installing from vLLM binaries ( #21523 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 04:56:27 -07:00
Jee Jee Li
61a6905ab0
[Model] Refactor JambaForCausalLM ( #21394 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-29 18:25:07 +08:00
Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding ( #21347 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
Isotr0py
a4528f0cac
[Model]: Fused MoE for nomic-embed-text-v2-moe ( #18321 )
...
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-29 03:13:27 -07:00
Cyrus Leung
a2480251ec
[Doc] Link to RFC for pooling optimizations ( #21806 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 23:53:18 -07:00
Nick Hill
7234fe2685
[Misc] Rework process titles ( #21780 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-29 05:14:47 +00:00
Benji Beck
f1e2c095ec
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema ( #21684 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-28 22:09:45 -07:00
Gregory Shtrasberg
12a223ef9b
[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM ( #21766 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-29 03:35:37 +00:00
Calvin Chen
e18f085103
skip fusedmoe layer for start_load_kv ( #21378 )
...
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
2025-07-28 18:59:44 -07:00
Michael Goin
afa2607596
[CI] Parallelize Kernels MoE Test ( #21764 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-28 18:56:24 -07:00
Wentao Ye
48b763d6b5
[Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod ( #21775 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-28 19:47:21 -06:00
Michael Goin
947e982ede
[Docs] Minimize spacing for supported_hardware.md table ( #21779 )
2025-07-28 18:46:39 -07:00
lyrisz
c6c9122d50
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning ( #20396 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
2025-07-28 23:13:58 +00:00
Lucas Wilkinson
8aa1485fcf
[Perf] Disable chunked local attention by default with llama4 ( #21761 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-28 18:49:04 -04:00
Nikhil Gupta
89ac266b26
[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels ( #17112 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-28 20:55:15 +00:00
Clayton Coleman
c6f36cfa26
[Bugfix] DeepGEMM is not enabled on B200 due to _lazy_init() ( #21472 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-28 20:51:22 +00:00
Kuntai Du
b18b417fbf
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" ( #21778 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-07-28 20:15:18 +00:00
Lu Fang
9ba1c88a93
[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure ( #21647 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-28 20:11:16 +00:00
Wentao Ye
e0e58f9729
[Bug] Enforce contiguous input for dynamic_scaled_fp8_quant and static_scaled_fp8_quant ( #21773 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-28 19:55:48 +00:00
rasmith
b361f14e39
[AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile ( #21350 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-07-28 15:38:20 -04:00
weiliang
01c753ed98
update flashinfer to v0.2.9rc2 ( #21701 )
...
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
2025-07-28 19:31:47 +00:00
Harry Mellor
94b71ae106
Use metavar to list the choices for a CLI arg when custom values are also accepted ( #21760 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-28 19:31:10 +00:00
Nick Hill
7d44c691b0
[P/D] Log warnings related to prefill KV expiry ( #21753 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-28 18:40:53 +00:00
Cyrus Leung
e17a4d3bf9
[Bugfix] Fix granite speech shape validation ( #21762 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 14:19:21 -04:00
Chaojun Zhang
ec261b0291
[XPU] IPEX-optimized Punica Wrapper on XPU ( #21703 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-28 16:43:37 +00:00
Cyrus Leung
04fe61aa3d
[CI/Build] Fix plugin tests ( #21758 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 15:08:05 +00:00
Michard Hugo
25708d317a
[Bugfix] Mistral crashes on tool with no description ( #21167 )
...
Signed-off-by: HugoMichard <hugo@harfanglab.fr>
2025-07-28 08:03:35 -07:00
Cyrus Leung
0e18a5d058
[Misc] Reduce logs for model resolution ( #21765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 07:59:56 -07:00
Michael Goin
34a20c49b3
[Logs] Change flashinfer sampler logs to once ( #21759 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-28 06:59:51 -07:00
Isotr0py
31084b3b1f
[Bugfix][CI/Build] Update peft version in test requirement ( #21729 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-28 06:17:43 -07:00
wuhang
bccc43c033
[Bugfix]check health for engine core process exiting unexpectedly ( #21728 )
...
Signed-off-by: wuhang <wuhang6@huawei.com>
2025-07-28 06:17:31 -07:00
Harry Mellor
1395dd9c28
[Docs] Add revision date to rendered docs ( #21752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-28 06:12:46 -07:00
Keyang Ru
9ace2eaf35
[Bugfix] Improve JSON extraction in LlamaToolParser ( #19024 )
...
Signed-off-by: keru <keyang.ru@oracle.com>
Co-authored-by: keru <keyang.ru@oracle.com>
2025-07-28 12:36:58 +00:00
Anton Vlasjuk
656c24f1b5
[Ernie 4.5] Name Change for Base 0.3B Model ( #21735 )
...
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 12:22:32 +00:00
Chauncey
63fe3a700f
[PD] let p2p nccl toy proxy handle /chat/completions ( #21734 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-28 11:45:50 +00:00
Isotr0py
0ae970ed15
[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme ( #21744 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-28 04:26:49 -07:00
Li, Jiang
65e8466c37
[Bugfix] Fix environment variable setting in CPU Dockerfile ( #21730 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-28 11:02:39 +00:00
Jee Jee Li
1b769dccf3
[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts ( #21717 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-28 11:02:25 +00:00
rongfu.leng
2cc571199b
[feature] add log non default args in LLM ( #21680 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-07-28 02:21:22 -07:00
Cyrus Leung
a4ed731546
[Model] Prioritize Transformers fallback over suffix matching ( #21719 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 02:15:31 -07:00
Benji Beck
d128d0d554
Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema ( #21686 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-28 01:16:35 -07:00
Asaf Joseph Gardin
a6c050286a
[v1][mamba] Added mamba_type into MambaSpec ( #21715 )
...
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-07-28 08:15:55 +00:00
Lucas Wilkinson
139a7f07bd
[BugFix] Fix ChunkedLocalAttention when the hybrid kv-cache is disabled ( #21707 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-28 07:18:47 +00:00
Ning Xie
150d9e6337
[Bugfix] fix max-file-size type from str to int ( #21675 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-28 00:06:52 -07:00
Cyrus Leung
139a97ec56
[Bugfix] Fix shape checking for Fuyu ( #21709 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 00:05:56 -07:00