ioana ghiban
1bb17ecb39
[CPU Backend] [Doc]: Update Installation Docs for CPUs ( #29868 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 13:33:50 +00:00
ioana ghiban
15b1511a15
[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. ( #29962 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 12:56:47 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template ( #29837 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Amr Mahdi
f5d3d93c40
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds ( #29452 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-03 11:41:53 +00:00
Fadi Arafeh
78f4bb0ba8
[DOC] Add Arm to list of compute resouces providers ( #29894 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-03 11:36:58 +00:00
HDCharles
b294e28db2
[refactor] CTMoEMethods to use QuantizationArgs ( #28871 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 11:00:56 +00:00
Roger Wang
787b84a9fc
[Bugfix] Follow-up fix on MediaWithBytes ( #29951 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-12-03 10:42:49 +00:00
Tsukasa OI
42c1949643
[Bugfix][Quantization] Support BF16 tensors on GGUF ( #29948 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-03 10:33:46 +00:00
Isotr0py
cc4e296ea6
[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests ( #29907 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:27:36 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True ( #29950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:05:10 +00:00
WeiQing Chen
7fe9c1a223
[CI] Add Async Eplb nightly CI tests ( #29385 )
...
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 09:51:08 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture ( #28040 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Yong Hoon Shin
69520bc695
Add logging for cudagraph related info ( #29825 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-12-03 01:01:48 -08:00
Andrew Xia
3a7751485b
[responsesAPI] support input output messages for non harmony models ( #29549 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 23:59:23 -08:00
Cyrus Leung
bbfb55c29e
[Misc] Allow fetch_* utils to access local files by default ( #29932 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-03 15:49:34 +08:00
JackieWu
0bec63fa31
[BugFix] fix imgs_pos in hunyuan_vl ( #29879 )
...
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 06:20:37 +00:00
elvischenv
c719c40540
[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it ( #29631 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 05:15:50 +00:00
Russell Bryant
b08025a83b
[Docs] Discuss api key limitations in security guide ( #29922 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-12-02 20:57:28 -08:00
Arpit Khandelwal
d7284a2604
[Core] Rename PassConfig flags as per RFC #27995 ( #29646 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 03:38:55 +00:00
Andreas Karatzas
506ed87e87
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues ( #29909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-03 10:36:49 +08:00
Roger Wang
4dd7978374
[Bugfix] Fix regression on pooling models from PR#29621 ( #29921 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-03 10:33:45 +08:00
Lucas Wilkinson
5cdd664509
[BugFix] Fix assert in build_for_cudagraph_capture ( #29893 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-02 16:56:54 -08:00
Alexei-V-Ivanov-AMD
5f67361fd1
Reverting re-direction to amd_mi355_X. ( #29914 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-12-03 00:40:02 +00:00
maang-h
5d91d2b292
[Doc] Add allocate_slots parameter docs ( #29777 )
...
Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-12-02 23:23:09 +00:00
Micah Williamson
c014de1ec7
[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI ( #29808 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-02 22:54:36 +00:00
Julien Denize
1b1e35aaf9
[BUGFIX] Fix regex pattern for Mistral Tool Call ( #29918 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-02 14:51:58 -08:00
Julien Denize
5e5646e206
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention ( #29908 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-02 14:51:20 -08:00
Chauncey
0a9caca9f5
[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine ( #29764 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-02 22:42:28 +00:00
Sage Moore
e6f114ac25
[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults ( #29911 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-12-02 13:20:22 -09:00
Harry Mellor
6fc5841db1
Fix some more Transformers nightly tests ( #29872 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 21:49:44 +00:00
dependabot[bot]
3ff5b53bc2
Bump actions/setup-python from 6.0.0 to 6.1.0 ( #29768 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 21:29:32 +00:00
jthomson04
1528e079e2
[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor ( #29826 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-02 21:25:52 +00:00
Divakar Verma
afb1e5b380
[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test ( #29123 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-02 20:46:10 +00:00
Copilot
1c593e117d
Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep ( #29025 )
...
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-02 20:40:56 +00:00
Navanit Dubey
a2b053dc85
feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE ( #29896 )
...
Signed-off-by: navanit-git <navanitdubey@gmail.com>
2025-12-02 19:28:35 +00:00
Matthew Bonanni
1d93f11675
[Attention][CUDAGraph] Remove CG padding from attention backends ( #29352 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-02 13:48:08 -05:00
Benjamin Bartels
2d613de9ae
[CI/Build] Fixes missing runtime dependencies ( #29822 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-02 10:21:49 -08:00
Alexei-V-Ivanov-AMD
c77b9929a0
Update AMD-CI testing mirror (as of 2025-12-02) ( #29898 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-12-02 08:52:54 -09:00
Isotr0py
63b1da76ba
[Chore]: Reorganize gguf utils funtions under transformers_utils ( #29891 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-02 17:33:23 +00:00
Andrew Xia
52cb349fc0
[responsesAPI][3] ResponsesParser to set up non harmony MCP ( #29413 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 11:24:45 -05:00
Isotr0py
0ec8422171
[Bugfix] Fix incorrect channel order for idefics3 in edge case ( #29881 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 16:03:52 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 15:54:28 +00:00
Matthew Bonanni
51c57b51dd
[Bugfix] Fix DeepSeek R1 MTP weight loading ( #29545 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-12-02 15:52:18 +00:00
ImaGoodFella
60c3d413af
[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values ( #29621 )
...
Signed-off-by: Rahul Steiger <rasteiger@ethz.ch>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-02 21:49:02 +08:00
Cyrus Leung
68ffbca7e4
[Chore] Use tokenizer.encode and tokenizer.decode directly ( #29851 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 12:30:40 +00:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 12:16:37 +00:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
2025-12-02 10:29:00 +00:00
Louie Tsai
8bbcf8b6e7
[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases ( #29381 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
2025-12-02 09:00:23 +00:00
Boyuan Feng
70fb77b4dc
[BugFix] add max-num-batched-token to scheduler hash ( #29829 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-02 08:55:02 +00:00
杰兮
48d15a32aa
[CI] Fix Bad_words test for tokenizer encode/decode asymmetry ( #28193 )
...
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
2025-12-02 00:02:12 -08:00