Laith Sakka
2e0ad629b0
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch ( #25110 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-14 14:11:10 -08:00
Gregory Shtrasberg
5a84b76b86
[ROCm][CI/Build] Change install location of uv ( #28741 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-11-14 21:34:18 +00:00
Marcin Ostrowski
0de4f217ab
[Bugfix] TypeError: 'NoneType' object is not callable ( #27410 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
2025-11-14 21:13:53 +00:00
Michael Goin
f08eab2acc
[CI] Fix macos smoke test uv cache issue ( #28736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-14 13:29:55 -07:00
Sage Moore
8977ffb5e6
[ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu ( #28682 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-11-14 11:06:01 -08:00
Andrey Khalyavin
fd4555089a
[BugFix] Fix misprint introduced by modular_kernel refactoring. ( #28728 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
2025-11-14 10:58:18 -08:00
GuanH
cec275efce
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure ( #28663 )
...
Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 18:44:27 +00:00
Cyrus Leung
e2741f6cbc
[Chore] Rename SchedulerConfig.chunked_prefill_enabled ( #28735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-14 18:39:57 +00:00
Harry Mellor
67187554dd
[Docs] Enable some more markdown lint rules for the docs ( #28731 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 18:39:19 +00:00
TJian
a425dc256e
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo ( #28716 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-14 10:30:50 -08:00
Fardin Hoque
964d65deed
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wei Wei <wwei6@meta.com>
2025-11-14 13:27:56 -05:00
Chen Wang
9261eb3dc1
docs(lora_resolvers): clarify multi-resolver order and storage path requirement ( #28153 )
...
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 18:08:30 +00:00
czhu-cohere
cdd7025961
[kernel] Improve FP8 PTPC on Hopper for larger shapes ( #28692 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-11-14 09:59:11 -08:00
Julien Denize
085424808e
Remove audio optional dependency for mistral-common ( #28722 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 09:54:38 -08:00
Mohammad Othman
a17e36f223
Fix typo in comment: existance -> existence ( #28737 )
...
Signed-off-by: Mohammad Othman <emranm226@hotmail.com>
2025-11-14 09:35:45 -08:00
Matthew Bonanni
8cc40f8992
[Attention] Bump FA for removed method ( #28429 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 09:13:37 -08:00
Nicolò Lucchesi
6f1e7f7226
[DisaggEverything] Tokens in<>out /generate endpoint ( #24261 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 09:58:01 -07:00
Michael Goin
d54a18a47e
[CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner ( #28688 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-14 09:37:18 -07:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend ( #28725 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 16:34:14 +00:00
dongbo910220
c934caee88
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL ( #28711 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-11-14 16:07:20 +00:00
Duncan Moss
3f8a874065
[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) ( #27134 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-14 08:02:44 -08:00
Cyrus Leung
511a6b611d
[Config] Clean up SchedulerConfig initialization ( #28665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-14 22:41:02 +08:00
Nicolò Lucchesi
96b23b8e3b
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue ( #28677 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-14 22:40:05 +08:00
zhaozx-cn
433c0f8675
[Model] Fix bailing_moe accuracy problem ( #28277 )
...
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
2025-11-14 13:33:02 +00:00
Fasal Shah
8d3748d3c7
[Doc] Fix macOS installation dependency resolution issue ( #26721 )
...
Signed-off-by: faisal shah <fashah@redhat.com>
2025-11-14 12:43:56 +00:00
Lucas Wilkinson
db56a59970
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) ( #28702 )
2025-11-14 12:19:22 +00:00
Yong Hoon Shin
9324e10275
Fix KV sharing fast prefill with cudagraph enabled ( #28537 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 11:53:42 +00:00
Jingchun Gao
4516d44b7f
[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer ( #25438 )
...
Signed-off-by: gaojc <1055866782@qq.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-14 11:24:10 +00:00
Shanshan Shen
41b92f7d38
[Model][MM] Extract conv layer as CustomOp ( #28455 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 19:16:13 +08:00
Srreyansh Sethi
360bd8762f
[Frontend] Added chat-style multimodal support to /classify. ( #27516 )
...
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-14 11:03:55 +00:00
lyn610
ecf8230d4d
[Metrics] Log number of preempted requests ( #28522 )
...
Add tracking and periodic logging for the number of preempted requests in the
metrics logger. This helps monitor system behavior under load.
Signed-off-by: Yining Liu <610lyn@gmail.com>
2025-11-14 09:47:45 +00:00
Xing Liu
8cfbe89b93
[Misc] fix comment in test_envs ( #28529 )
...
Signed-off-by: Xing Liu <xingliu14@gmail.com>
2025-11-14 09:32:46 +00:00
Boyuan Feng
fd75d3e8c0
[Minor] avoid register new custom and just import silly_attn ( #28578 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-14 09:32:31 +00:00
Michael Goin
c9a3a02149
Add output token counting to gsm8k eval ( #28594 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-14 09:32:03 +00:00
Nick Hill
bc3e43069a
[BugFix] Fix multi-modal async scheduling race condition ( #28706 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-14 01:11:13 -08:00
Jiangyun Zhu
c36bcfe6b3
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-14 09:01:26 +00:00
Yan Ma
529cea343d
use default CCL_ZE_IPC_EXCHANGE ( #28700 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-11-14 16:55:29 +08:00
rasmith
93103575ce
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate ( #28311 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-13 22:41:29 -08:00
rasmith
15ae8e0784
[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) ( #28432 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-11-13 22:34:01 -08:00
haoyangli-amd
0b25498990
[Misc] add ignore mapper for quark quantization ( #28275 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
2025-11-14 05:56:35 +00:00
Roger Wang
0aecd9138f
[Misc] Update xformers to 0.33.0.post1 ( #28678 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-13 21:52:53 -08:00
Kunshang Ji
da14ae0fad
[XPU][CI]disable lm cache uts ( #28696 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-14 03:15:50 +00:00
Cyrus Leung
01bea115c4
[Misc] Remove warn_for_unimplemented_methods ( #28613 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-14 11:10:10 +08:00
Bradley D
b39a5026eb
[ci][amd] fix basic models extra init test ( #28676 )
...
Signed-off-by: Bradley Davis <bradleyhd@meta.com>
2025-11-14 02:44:36 +00:00
Michael Goin
622e6106a9
[CPU][Bugfix] Fix Apple Silicon M1 compilation failure ( #28681 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-14 09:49:55 +08:00
Sage Moore
2aa75c752b
[ROCm] Bump up the version of amd-smi to 6.4.3 ( #28680 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-11-14 01:24:28 +00:00
Hank_
4d5943bda6
[quantization][config] enable override existing quant_config ( #28510 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-14 01:24:10 +00:00
Alexei-V-Ivanov-AMD
f2b8e1c551
Mirrored test group definitions for AMD (2025-11-11) ( #28573 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-11-14 00:16:34 +00:00
Mark McLoughlin
6e25b1cddf
[KV Connector] Test async mode in scheduler tests ( #28550 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-13 18:30:59 -05:00
Wentao Ye
e64011f29a
[CI] Bug: Fix ci entrypoint pooling ( #28684 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-13 14:19:35 -08:00