Michael Yao
|
c2a8b08fcd
|
[Doc] Fix issues in integrations/llamastack.md (#24428)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-08 02:28:32 -07:00 |
|
Didier Durand
|
f4962a6d55
|
[Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 00:22:16 -07:00 |
|
Michael Yao
|
2f0b833a05
|
[Docs] Fix a tip indentation and typo (#24419)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-08 00:19:40 -07:00 |
|
Chauncey
|
425b04b8f4
|
[gpt-oss][Responses API] Fix the function call id format (#24409)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-08 06:49:52 +00:00 |
|
Chatcharin Sangbutsarakum
|
60f0843ef8
|
[Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess (#24334)
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-07 23:11:12 -07:00 |
|
Chatcharin Sangbutsarakum
|
8a46602606
|
[Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess (#24332)
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-07 23:10:54 -07:00 |
|
Chauncey
|
61aa4b2901
|
[P/D] Add a shutdown method to the Connector API (#22699)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-07 23:07:00 -07:00 |
|
Al-Ekram Elahee Hridoy
|
8c892b1831
|
[Doc] Fix UTF-8 encoding issues in documentation generation on Windows (#24361)
Signed-off-by: alekramelaheehridoy <aliqramalaheehridoy@gmail.com>
Signed-off-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com>
Co-authored-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com>
|
2025-09-07 22:33:52 -07:00 |
|
Chenheli Hua
|
3bca396f79
|
[CI/Build] Fix local image inputs in test_pixtral.py (#24401)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-08 03:31:35 +00:00 |
|
22quinn
|
3a3e91bdfe
|
[CI/Build] Disable flaky test_structured_output tests (#24404)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-08 02:51:59 +00:00 |
|
Xingyu Liu
|
b3d7e3c845
|
[Sampler] Support returning all prompt logprobs (#23868)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-07 19:34:31 -07:00 |
|
Yan Ma
|
67841317d1
|
[xpu] upgrade ipex/python3.12 for xpu (#23830)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-09-08 02:07:16 +00:00 |
|
Ming Yang
|
86173ad593
|
[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-08 09:27:12 +08:00 |
|
Lucia Fang
|
795b6951cd
|
Add @luccafong to codeowner for spec decode (#24397)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-09-08 08:30:27 +08:00 |
|
Woosuk Kwon
|
2e5d21378d
|
Skip MM Encoder for non-first PP ranks (#24387)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-07 09:38:35 -07:00 |
|
Flora Feng
|
0661cb9df3
|
Add renderer-based prompt processing for embedding and classification endpoints (#24356)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-07 08:26:48 +00:00 |
|
Woosuk Kwon
|
105d3d62ef
|
[TPU] Remove TopKTopPSampler dependency for TPU sampler (#24391)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-07 01:12:36 -07:00 |
|
Jee Jee Li
|
62f66be1f7
|
[Bugfix] Fix Qwen3-coder moe tuned config (#24072)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-07 05:19:46 +00:00 |
|
Ye (Charlotte) Qi
|
81c53ef55c
|
[Misc] collect flashinfer version in collect_env.py (#24378)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-07 03:30:41 +00:00 |
|
Saman A. Pour
|
75334956c2
|
QWEN3 Thinking Fused MoE kernels Optimization configs (#24330)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-07 03:18:54 +00:00 |
|
Jiangyun Zhu
|
77aec83b8c
|
[Benchmark] add benchmark for custom activation op (#23908)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-06 20:12:05 -07:00 |
|
Aaron Pham
|
e67597545b
|
[CI][Fix] deterministic seed for flaky CI runs on structured outputs (#24380)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-09-07 11:10:40 +08:00 |
|
Benji Beck
|
37a6fa95fd
|
Migrate Qwen2 inputs to TensorSchema (#23475)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-06 20:07:31 -07:00 |
|
youkaichao
|
558f0907dc
|
[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-07 01:18:59 +00:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
Bangsheng Tang
|
848562bd49
|
break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265)
Co-authored-by: Bangsheng Tang <bangsheng@meta.com>
|
2025-09-06 14:02:47 -07:00 |
|
elvischenv
|
e68dc2f014
|
[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-06 20:39:34 +00:00 |
|
Ye (Charlotte) Qi
|
a3645ed94d
|
[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-06 13:27:15 -07:00 |
|
Aaron Pham
|
fb691ee4e7
|
[Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324)
|
2025-09-06 19:10:32 +00:00 |
|
Ashwin Phadke
|
6024d115cd
|
Lora bias(enable_lora_bias) deprecate warning (#24339)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-07 00:42:19 +08:00 |
|
Jee Jee Li
|
7555d6b34a
|
[Bugfix] Fix test_mixtral_moe (#24371)
|
2025-09-06 09:32:03 -07:00 |
|
Isotr0py
|
00a4e56d8d
|
[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-06 09:23:12 -07:00 |
|
mohankku
|
0eadaeff7e
|
[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335)
Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com>
Signed-off-by: mohankku <mohan.cbein@gmail.com>
|
2025-09-06 08:17:03 -07:00 |
|
Benjamin Chislett
|
0077c8634e
|
Add @benchislett to codeowner for spec decode and structured outputs (#24362)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-09-06 22:03:35 +08:00 |
|
Roger Wang
|
b121ca22ad
|
[CI] Disable flaky structured output test from CI (#24366)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-06 13:31:56 +00:00 |
|
Roger Wang
|
eddaafc1c7
|
[Multimodal] Improve max video embedding length estimation in V1 (#24312)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-09-06 02:33:19 -07:00 |
|
Andrew Sansom
|
305a1cc0d2
|
refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-05 23:01:23 -07:00 |
|
wang.yuqi
|
6d6c6b05d3
|
[New Model]: google/embeddinggemma-300m (#24318)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-05 22:58:36 -07:00 |
|
Isotr0py
|
53b19ccdd5
|
[Core] Allow disabling TP sharding for parallel Linear layer (#23024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-05 22:53:58 -07:00 |
|
Nick Hill
|
6432739ef1
|
[Bugfix] Catch and log invalid token ids in detokenizer (#24351)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-05 22:30:22 -07:00 |
|
yzds
|
ac201a0eaf
|
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-06 13:24:05 +08:00 |
|
Yong Hoon Shin
|
3c529fc994
|
[KV Sharing] Raise error if using eagle with fast prefill (#24350)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-09-05 20:22:40 -07:00 |
|
Didier Durand
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
22quinn
|
35efa70297
|
Add @22quinn as code reviewer for RL related components (#24346)
|
2025-09-06 01:56:15 +00:00 |
|
Benjamin Chislett
|
cee182b297
|
[Perf][V1] Fully overlap model execution (#23569)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-09-05 18:20:17 -07:00 |
|
Rafael Vasquez
|
c954c6629c
|
[CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-05 17:26:22 -07:00 |
|
Shiyan Deng
|
9dfbeb41e5
|
[RFC] allow cancelation after shutdown in blocking collective_rpc (#23390)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2025-09-05 14:14:18 -07:00 |
|
elvischenv
|
eedb2a2a10
|
[Bugfix] Fix silu_mul+quant fusion test (#24341)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-05 20:13:42 +00:00 |
|
Chauncey
|
23a6c5280e
|
[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-05 10:26:00 -07:00 |
|
youkaichao
|
7812bcf278
|
[docs] add shenzhen meetup (#24326)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-05 22:48:42 +08:00 |
|