Eric Curtin
b876860c62
[Hardware][CPU] Build fix for ARM without BF16 ( #21848 )
...
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-07-30 06:22:00 -07:00
Patrick von Platen
13986365a9
Add @patrickvonplaten as maintainer of mistral's related files. ( #21928 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-07-30 20:42:51 +08:00
Hongsheng Liu
5c8fe389d6
[Docs] Fix the example code of streaming chat completions in reasoning ( #21825 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Zi Wang <66560864+BruceW-07@users.noreply.github.com>
2025-07-30 12:11:58 +00:00
Cyrus Leung
5bbaf492a6
[Doc] Update partial support ( #21916 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-30 01:32:39 -07:00
Peter Pan
533db0935d
[benchmark] add max-concurrency in result table ( #21095 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-30 01:15:43 -07:00
Jee Jee Li
fc91da5499
[Model] Remove DSV2 unused code ( #21903 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-30 00:55:03 -07:00
Varun Vinayak Shenoy
547795232d
[Tests] Fixing bug inside MultiModalProfiler. ( #21842 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
2025-07-30 00:44:15 -07:00
Kebe
30ef30ed5a
[CI] rollback lint-and-deploy pipeline using amd machine ( #21912 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-30 00:37:59 -07:00
Jee Jee Li
02f82fe438
[Doc] Update Intern-S1 info ( #21908 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-29 23:58:57 -07:00
Cyrus Leung
2ca5f82c2a
[Misc] Remove redundant config definitions ( #21891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 23:54:18 -07:00
Louie Tsai
6f8d261882
Update vLLM Benchmark Suite for Xeon based on 0.9.2 release ( #21486 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-07-30 05:57:03 +00:00
Ricardo Decal
4cd7fe6cea
[Docs] Expand introduction to Ray in Multi-node deployment section ( #21584 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-29 22:07:28 -07:00
Cyrus Leung
16f3250527
[CI/Build] Fix pre-commit failure in docs ( #21897 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 21:53:08 -07:00
Tao He
e3bc17ceea
Add @sighingnow as maintainer of qwen's related files. ( #21895 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-07-29 21:30:44 -07:00
Kunshang Ji
05cbbe20c5
[XPU] use ZE_AFFINITY_MASK for device select on xpu ( #21815 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-30 03:56:14 +00:00
wang.yuqi
65f311ce59
[Frontend] Add LLM.reward specific to reward models ( #21720 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-29 20:56:03 -07:00
Wentao Ye
1b0a155534
[Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant ( #21867 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-29 21:50:46 -06:00
Cyrus Leung
44bc46da60
[Bugfix] Actually disable processing cache when API server is scaled out ( #21839 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 20:36:04 -07:00
MingzhenHan
b7b23da4d2
[Bugfix] Fix comment typo of get_num_common_prefix_blocks() ( #21827 )
...
Signed-off-by: MingzhenHan <hanmingzhen2002@outlook.com>
2025-07-29 20:35:33 -07:00
Areeb Syed
fdde18229e
[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization ( #21808 )
...
Signed-off-by: sydarb <areebsyed237@gmail.com>
2025-07-30 11:35:21 +08:00
Csrayz
b917da442b
Expose PyTorch profiler configuration to environment variables ( #21803 )
...
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>
2025-07-29 19:46:31 -07:00
Michael Goin
fb58e3a651
[Docs] Update docker.md with HF_TOKEN, new model, and podman fix ( #21856 )
2025-07-29 19:45:41 -07:00
Chen Zhang
76080cff79
[DOC] Fix path of v1 related figures ( #21868 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-29 19:45:18 -07:00
Harry Mellor
ba5c5e5404
[Docs] Switch to better markdown linting pre-commit hook ( #21851 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 19:45:08 -07:00
Chen Zhang
555e7225bc
[v1][attention] Support Hybrid Allocator + FlashInfer ( #21412 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-07-30 01:45:29 +00:00
milesial
0e36abf993
[Bugfix] Correct max tokens for non-contiguous embeds ( #21798 )
...
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
2025-07-30 01:16:25 +00:00
Simon Mo
452b2a3180
[ci] mark blackwell test optional for now ( #21878 )
2025-07-29 18:03:27 -07:00
Simon Mo
0d0cc9e150
[ci] add b200 test placeholder ( #21866 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-07-29 17:11:50 -07:00
Yong Hoon Shin
9266d98048
[BugFix] Fix interleaved sliding window not set for Gemma3n ( #21863 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-29 16:34:19 -07:00
Gregory Shtrasberg
176bbce1db
Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure ( #21647 )" ( #21850 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-29 21:56:29 +00:00
Doug Smith
a1873db23d
docker: docker-aware precompiled wheel support ( #21127 )
...
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-07-29 14:45:19 -07:00
Michael Goin
a33ea28b1b
Add flashinfer_python to CUDA wheel requirements ( #21389 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-29 12:51:58 -07:00
David Xia
7b49cb1c6b
[Doc] update Contributing page's testing section ( #18272 )
...
Signed-off-by: David Xia <david@davidxia.com>
2025-07-29 10:32:46 -07:00
Varun Sundar Rabindranath
f03e9cf2bb
[Doc] Add FusedMoE Modular Kernel Documentation ( #21623 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-29 10:32:30 -07:00
David Xia
37f86d9048
[Docs] use uv in GPU installation docs ( #20277 )
...
Signed-off-by: David Xia <david@davidxia.com>
2025-07-29 10:32:06 -07:00
elvischenv
58b11b24a6
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend ( #21525 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-29 10:34:00 -04:00
Wenhua Cheng
ad341c5194
[Bugfix]fix mixed bits and visual language model quantization in AutoRound ( #21802 )
...
Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>
2025-07-29 07:26:31 -07:00
Brittany
759b87ef3e
[TPU] Add an optimization doc on TPU ( #21155 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 07:23:19 -07:00
Harry Mellor
f693b067a2
[Docs] Merge design docs for a V1 only future ( #21832 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 07:22:50 -07:00
Richard Zou
04e38500ee
[Bugfix] VLLM_V1 supports passing other compilation levels ( #19340 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-29 09:35:58 -04:00
Cyrus Leung
ab714131e4
[Doc] Update compatibility matrix for pooling and multimodal models ( #21831 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-29 06:29:51 -07:00
Chen Zhang
755fa8b657
[KVCache] Make KVCacheSpec hashable ( #21791 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-07-29 19:58:29 +08:00
Kay Yan
2470419119
[Docs] Fix the outdated URL for installing from vLLM binaries ( #21523 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 04:56:27 -07:00
Jee Jee Li
61a6905ab0
[Model] Refactor JambaForCausalLM ( #21394 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-29 18:25:07 +08:00
Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding ( #21347 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
Isotr0py
a4528f0cac
[Model]: Fused MoE for nomic-embed-text-v2-moe ( #18321 )
...
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-29 03:13:27 -07:00
Cyrus Leung
a2480251ec
[Doc] Link to RFC for pooling optimizations ( #21806 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-28 23:53:18 -07:00
Nick Hill
7234fe2685
[Misc] Rework process titles ( #21780 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-29 05:14:47 +00:00
Benji Beck
f1e2c095ec
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema ( #21684 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-07-28 22:09:45 -07:00
Gregory Shtrasberg
12a223ef9b
[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM ( #21766 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-29 03:35:37 +00:00