dependabot[bot]
562663a044
Bump actions/github-script from 7.0.1 to 8.0.0 ( #24413 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-09 03:12:44 +00:00
dependabot[bot]
ed1623a88a
Bump actions/stale from 9.1.0 to 10.0.0 ( #24412 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-09 03:11:20 +00:00
cjackal
13b89bd823
[doc] update vllm serve cli args documentation ( #24329 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-09-09 03:07:58 +00:00
dependabot[bot]
22a0070530
Bump actions/setup-python from 5.4.0 to 6.0.0 ( #24414 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-09 02:54:58 +00:00
zhiweiz
170129eb28
[gpt-oss] Harmony changes with container tool support ( #23386 )
...
Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-09-08 19:03:50 -07:00
Tyler Michael Smith
955c624915
[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE ( #24134 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-08 19:01:51 -07:00
Zhiyu
4f87abdcc6
Update reviewers for modelopt related files ( #24468 )
2025-09-09 01:53:13 +00:00
Sahithi Chigurupati
6910b56da2
[CI] Add nightly multiarch manifests to dockerhub ( #24102 )
...
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-09 01:18:09 +00:00
R3hankhan
e10fef0883
[Hardware][IBM Z] Fix Outlines Core issue for s390x ( #24034 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-09-08 16:50:34 -07:00
Chauncey
e680723eba
[Bugfix] Disable the statslogger if the api_server_count is greater than 1 ( #22227 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-09-08 15:28:03 -07:00
Matthew Bonanni
620db1fc58
[Attention] FlashAttention MLA cudagraph support ( #23958 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-08 22:05:26 +00:00
Ekagra Ranjan
41183c1fe0
[Spec Decode] Fix offline spec_decode.py ( #24257 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-08 20:44:13 +00:00
Yang Kaiyong
43d9ad03ba
[Model loader]: support multi-thread model weight loading ( #23928 )
...
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-08 18:49:39 +00:00
Jiangyun Zhu
7be141b2c5
[CI] Enable encoder model compilation test ( #24442 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-08 11:48:06 -07:00
Jee Jee Li
8d7f39b48c
[Model] Remove quantized mixtral ( #24437 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-08 11:02:14 -07:00
Ekagra Ranjan
cd08636926
[Spec Decode][Benchmark] Add Blitzedit dataset ( #23605 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-08 10:32:52 -07:00
Ekagra Ranjan
3feeeb9fea
[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking ( #23563 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2025-09-08 10:32:42 -07:00
Jee Jee Li
6f4a82f8b5
[Model] Enable BNB support for qwen2_5_omni_thinker ( #24420 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-08 09:37:08 -07:00
rongfu.leng
c44797a4d6
[Docs]add eplb_config param use docs ( #24213 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-09-08 09:36:57 -07:00
Didier Durand
55be93baf5
[Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure ( #24438 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 09:36:54 -07:00
Harry Mellor
717fc00e98
[Docs] Move feature compatibility tables to README ( #24431 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 06:45:14 -07:00
Chenheli Hua
01dfb5e982
[Frontend] User-provided uuids for medias in chat. (RFC #22044 ) ( #23449 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-08 06:42:20 -07:00
Harry Mellor
03dd652c16
Move KVEventsConfig from config/__init__.py to config/kv_events.py ( #24433 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 06:41:27 -07:00
Christian Pinto
9cd76b71ab
[Misc] Terratorch related fixes ( #24337 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-08 06:40:26 -07:00
tomeras91
e041314184
[Bugfix] Fix mamba2 prefill chunking ( #23279 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-08 11:42:41 +00:00
Li Wang
5e537f45b4
[Bugfix] Fix get_quant_config when using modelscope ( #24421 )
...
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-08 11:03:02 +00:00
Michael Yao
c2a8b08fcd
[Doc] Fix issues in integrations/llamastack.md ( #24428 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-08 02:28:32 -07:00
Didier Durand
f4962a6d55
[Doc]: fix typos in Python comments ( #24417 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 00:22:16 -07:00
Michael Yao
2f0b833a05
[Docs] Fix a tip indentation and typo ( #24419 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-08 00:19:40 -07:00
Chauncey
425b04b8f4
[gpt-oss][Responses API] Fix the function call id format ( #24409 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-08 06:49:52 +00:00
Chatcharin Sangbutsarakum
60f0843ef8
[Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess ( #24334 )
...
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-07 23:11:12 -07:00
Chatcharin Sangbutsarakum
8a46602606
[Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess ( #24332 )
...
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-07 23:10:54 -07:00
Chauncey
61aa4b2901
[P/D] Add a shutdown method to the Connector API ( #22699 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-07 23:07:00 -07:00
Al-Ekram Elahee Hridoy
8c892b1831
[Doc] Fix UTF-8 encoding issues in documentation generation on Windows ( #24361 )
...
Signed-off-by: alekramelaheehridoy <aliqramalaheehridoy@gmail.com>
Signed-off-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com>
Co-authored-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com>
2025-09-07 22:33:52 -07:00
Chenheli Hua
3bca396f79
[CI/Build] Fix local image inputs in test_pixtral.py ( #24401 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-08 03:31:35 +00:00
22quinn
3a3e91bdfe
[CI/Build] Disable flaky test_structured_output tests ( #24404 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-09-08 02:51:59 +00:00
Xingyu Liu
b3d7e3c845
[Sampler] Support returning all prompt logprobs ( #23868 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-07 19:34:31 -07:00
Yan Ma
67841317d1
[xpu] upgrade ipex/python3.12 for xpu ( #23830 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-08 02:07:16 +00:00
Ming Yang
86173ad593
[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA ( #24385 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-08 09:27:12 +08:00
Lucia Fang
795b6951cd
Add @luccafong to codeowner for spec decode ( #24397 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-09-08 08:30:27 +08:00
Woosuk Kwon
2e5d21378d
Skip MM Encoder for non-first PP ranks ( #24387 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-07 09:38:35 -07:00
Flora Feng
0661cb9df3
Add renderer-based prompt processing for embedding and classification endpoints ( #24356 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-07 08:26:48 +00:00
Woosuk Kwon
105d3d62ef
[TPU] Remove TopKTopPSampler dependency for TPU sampler ( #24391 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-07 01:12:36 -07:00
Jee Jee Li
62f66be1f7
[Bugfix] Fix Qwen3-coder moe tuned config ( #24072 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-07 05:19:46 +00:00
Ye (Charlotte) Qi
81c53ef55c
[Misc] collect flashinfer version in collect_env.py ( #24378 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-07 03:30:41 +00:00
Saman A. Pour
75334956c2
QWEN3 Thinking Fused MoE kernels Optimization configs ( #24330 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com>
2025-09-07 03:18:54 +00:00
Jiangyun Zhu
77aec83b8c
[Benchmark] add benchmark for custom activation op ( #23908 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-06 20:12:05 -07:00
Aaron Pham
e67597545b
[CI][Fix] deterministic seed for flaky CI runs on structured outputs ( #24380 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-09-07 11:10:40 +08:00
Benji Beck
37a6fa95fd
Migrate Qwen2 inputs to TensorSchema ( #23475 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-06 20:07:31 -07:00
youkaichao
558f0907dc
[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode ( #24372 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-07 01:18:59 +00:00