Michael Goin
|
dbc3d9991a
|
[UX] Put CUDA attention backend selection log into one line (#29337)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-25 06:46:18 -08:00 |
|
Injae Ryou
|
794029f012
|
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137)
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-25 14:28:53 +00:00 |
|
Eldar Kurtić
|
0231ce836a
|
Revert back to torch.equal over torch.allclose from #28819 (#29086)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-25 14:23:38 +00:00 |
|
Thomas Parnell
|
516c3f7847
|
[Bugfix] Fix logic for choosing default prefix caching setting (#29393)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-11-25 14:05:10 +00:00 |
|
Harry Mellor
|
51fc9e017a
|
Scheduled removal of CompilationConfig.use_inductor (#29323)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 12:55:42 +00:00 |
|
Harry Mellor
|
bf0c75cd4f
|
Make Transformers Nightly tests soft-fail and enable all tests (#29401)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 12:41:15 +00:00 |
|
Roger Wang
|
c2c661af9b
|
[Bugfix] Fix overallocation in MM profiling (#29386)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-25 12:38:36 +00:00 |
|
Nicolò Lucchesi
|
798e87db5c
|
[Core] Generalize Encoder-Decoder seq_lens computation to avoid Whisper hardcoded logic (#29268)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-11-25 11:32:11 +00:00 |
|
wang.yuqi
|
de6889946b
|
[Misc] Suppress log outputs when constructing the default vllm config. (#29291)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 03:00:44 -08:00 |
|
wang.yuqi
|
7a80b01889
|
[CI] Resettle pooling entrypoints tests. (#29370)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 10:39:10 +00:00 |
|
Ben Browning
|
e1dd706cd1
|
[Frontend] Respect Chat Completion parallel_tool_calls param (#26233)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-25 09:56:15 +00:00 |
|
Andrew Xia
|
a685b47c57
|
[responsesAPI] refactor construct_input_messages (#29359)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-25 09:47:10 +00:00 |
|
Avishek Goswami
|
32c40b95e0
|
[BugFix] bad_words filtering ineffective when n > 1 (#29313)
Signed-off-by: GOavi101 <1704178@kiit.ac.in>
|
2025-11-25 09:36:34 +00:00 |
|
Nick Hill
|
db2906108a
|
[Misc] Streamline unique id generation (#29375)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 08:30:11 +00:00 |
|
wang.yuqi
|
67fc16cd8c
|
[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 16:06:09 +08:00 |
|
elvischenv
|
6330f9477d
|
[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-11-25 07:59:40 +00:00 |
|
Micah Williamson
|
ef1f7030f0
|
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-25 07:55:09 +00:00 |
|
Rémi Delacourt
|
12c007e288
|
EAGLE Support DP>1 (#26086)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
|
2025-11-25 07:32:21 +00:00 |
|
zhrrr
|
f242cfcdd5
|
[Perf] use cpu all reduce to avoid sync when async_scheduling & dp > 1 (#29311)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-25 15:31:07 +08:00 |
|
Icey
|
888152bf87
|
Allow oot custom compiler extension via CompilerInterface (#28623)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-11-25 15:25:15 +08:00 |
|
Ryan Rock
|
fe3a4f5b34
|
[CI/Build] Pin torchgeo dependency for AMD (#29353)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-11-25 07:14:59 +00:00 |
|
Fadi Arafeh
|
98caeadd54
|
[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei (#29273)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-11-25 15:11:11 +08:00 |
|
vllmellm
|
64deead719
|
[Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-25 06:56:06 +00:00 |
|
Nick Hill
|
7992324f23
|
[BugFix] Use unique ids for different transcription prompts (#29372)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 06:55:16 +00:00 |
|
Inoki
|
40a6f53f6c
|
Display warning only when ROCm version is less than Pytorch required version (#29200)
Signed-off-by: Inoki <inoki@inoki.cc>
|
2025-11-25 14:40:06 +08:00 |
|
kflu
|
ce58fdc1c3
|
Fix PoolingParams.skip_reading_prefix_cache type (#29364)
Signed-off-by: KFL <kludev@gmail.com>
|
2025-11-25 06:39:29 +00:00 |
|
Fanli Lin
|
a21256c463
|
Add TP CLI argument to multimodal inference examples (#29301)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-25 06:03:20 +00:00 |
|
Harry Mellor
|
316c8492bf
|
Scheduled removal of guided_* config fields (#29326)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 05:24:05 +00:00 |
|
Lucas Wilkinson
|
2d9ee28cab
|
[CI/Test Fix] Fix CP tests on Blackwell (#29338)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-24 20:55:57 -08:00 |
|
Jiangyun Zhu
|
81db702ed2
|
[Attention] add _cudagraph_support for linear attention (#28934)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-25 12:25:20 +08:00 |
|
Isotr0py
|
92effb07a4
|
[Model] Add HunyuanOCR support (#29327)
Signed-off-by: manayang <jackmanayang@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: sergeywang <sergeywang@tencent.com>
Co-authored-by: manayang <jackmanayang@gmail.com>
Co-authored-by: manayang <manayang@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-25 03:28:51 +00:00 |
|
Maryam Tahhan
|
87185c88d5
|
[Bugfix] Make deprecated --task embedding consistent with `--runner… (#29312)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
|
2025-11-25 03:19:52 +00:00 |
|
Mark McLoughlin
|
9cf4edae6e
|
[Metrics] Scheduled removal of deprecated metrics (#29330)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-25 11:15:13 +08:00 |
|
汪志鹏
|
7012d8b45e
|
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB (#29060)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-24 19:54:00 -07:00 |
|
Divakar Verma
|
22b42b5402
|
[CI][ROCm] Install arctic-inference on ROCm tests (#29344)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-11-25 02:15:39 +00:00 |
|
gbyu-amd
|
cb7214d8ea
|
[ROCm][MLA] enable fp8 MLA decode on ROCm (#28032)
Signed-off-by: guanbao <gyu@amd.com>
Signed-off-by: Guanbao Yu <gyu@amd.com>
Signed-off-by: gbyu-amd <Guanbao.Yu@amd.com>
Co-authored-by: guanbao <gyu@amd.com>
|
2025-11-25 10:15:02 +08:00 |
|
Pleaplusone
|
77e10c9cab
|
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-24 19:05:46 -07:00 |
|
Michael Goin
|
6f1355a1b7
|
[Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-24 19:01:40 -07:00 |
|
Harry Mellor
|
a4ad43ad5a
|
Scheduled removal of ParallelConfig's direct child EPLB fields (#29324)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 01:58:58 +00:00 |
|
Nick Hill
|
a178a0b40b
|
[BugFix] Fix duplicate id tool-call race condition (#29355)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 01:54:26 +00:00 |
|
Kunshang Ji
|
b8328b49fb
|
[XPU] upgrade torch & ipex 2.9 on XPU platform (#29307)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-25 09:34:47 +08:00 |
|
Hanjie Qiu
|
5f9679a43b
|
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-24 20:13:12 -05:00 |
|
Wentao Ye
|
699bca76c0
|
[UX] Raise error for attn backend of batch invariant (#29348)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-24 17:49:01 -07:00 |
|
Michael Goin
|
c17610e2ba
|
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-24 18:22:46 -05:00 |
|
Chen Zhang
|
71df2a57ef
|
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-24 14:28:32 -08:00 |
|
Tyler Michael Smith
|
4dd42db566
|
Remove VLLM_SKIP_WARMUP tip (#29331)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-11-24 22:16:05 +00:00 |
|
Nick Hill
|
84371daf75
|
[Tests] Verify gpt_oss package is installed in harmony tests (#29336)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-24 22:04:31 +00:00 |
|
Woosuk Kwon
|
f32c7d6f54
|
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected (#29347)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 13:54:59 -08:00 |
|
Yan Ma
|
3cfa63ad99
|
[XPU]fix Kimi-VL-A3B-thinking on xpu (#29309)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-11-24 21:02:21 +00:00 |
|
Benjamin Bartels
|
4d6afcaddc
|
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies (#29270)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
|
2025-11-24 11:40:54 -08:00 |
|