Sage Moore
ae9d0e7da5
[Bugfix] Make DP padding optional in coordinate_batch_across_dp ( #26375 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-10-10 10:53:33 -04:00
Daniel Cámpora
0e67102d93
Added test_top_k_per_row to test-pipeline.yaml. ( #26569 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-10-10 10:48:33 -04:00
Jason Li
f4ba2061cf
[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 ( #26038 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: <>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 07:42:13 -07:00
Chauncey
1e6848a65d
[CI] fix test_run_batch.py::test_completions - AssertionError ( #26578 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 22:16:28 +08:00
Andy Lo
67661375fa
[BugFix] Fix noop elimination edge case ( #26394 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-10-10 13:33:04 +00:00
Lucas Kabela
213b64452a
[Bugfix] Convert untraceable GroupShape to list for AMD impl ( #26535 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-10-10 13:32:29 +00:00
Mark McLoughlin
784c231151
[NIXL] Ignore abort on already-finished request ( #25067 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 12:21:56 +02:00
Chen Zhang
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching ( #26296 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-10 03:02:49 -07:00
Chauncey
720d3cd0f0
[CI] fix ruff format ( #26579 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 03:02:12 -07:00
Ashwin Phadke
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar
3ee202ea1e
[GPT-OSS] Add support for arrays at tool message content ( #25593 )
...
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>
2025-10-10 09:00:45 +00:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset ( #26285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-10 01:45:55 -07:00
Chen Zhang
6f0f570c43
[deepseek] kernel block size for UniformTypeKVCacheSpecs ( #26559 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-10 16:40:41 +08:00
Boyuan Feng
b545a0b207
fix test_simple_inductor_graph_partition ( #26522 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-10 06:39:19 +00:00
Lucas Wilkinson
29255cfc3b
[Spec-Decode] Support piecewise cudagraphs for Eagle head ( #25109 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-10-10 01:20:31 -04:00
Ben Browning
da4455609d
[Chore]: One pythonic tool parser test uses the wrong parser ( #26515 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-10-10 04:03:55 +00:00
Nick Hill
aafb99a4d4
[Core] Small simplification in GPUModelRunner._update_states() ( #26508 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 10:53:58 +08:00
Rui Qiao
757fa4a4da
[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY ( #23849 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-09 19:53:43 -07:00
Julien Denize
c6187f55f7
Refactor MistralTokenizer ( #26358 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-10-09 22:48:58 +00:00
Wentao Ye
8983e0216f
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" ( #26448 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-09 15:16:48 -07:00
Wentao Ye
1ee35382cb
[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero ( #26528 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-09 15:13:27 -07:00
Benjamin Chislett
6e783bc54b
[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency ( #26499 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-10-09 17:12:34 -04:00
Michael Goin
c9d33c60dc
[UX] Add FlashInfer as default CUDA dependency ( #26443 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-09 14:10:02 -07:00
Nick Hill
2e54db4d2b
[Core] Remove unused prev_sampled_token_ids_invalid_indices input batch field ( #26514 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-09 20:22:14 +00:00
elvischenv
44f633dba1
[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention ( #25674 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-10-09 16:13:39 -04:00
bnellnm
a462331e36
[Bugfix] Disable moe inplace for torch >= 2.9 ( #26497 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-09 18:07:38 +00:00
roikoren755
4069db3f2e
[Bugfix] Enable padded FP4 quantization ( #25947 )
...
Signed-off-by: Roi Koren <roik@nvidia.com>
2025-10-09 10:59:41 -07:00
Sage Moore
0d37450eb7
[BUGFIX] Add cu_tokens_across_sp to DPMetadata ( #26457 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-10-09 17:13:56 +00:00
bnellnm
47e66c24e2
[Model] Apply shared experts overlap optimization to all models with shared experts ( #26145 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-09 11:31:04 -04:00
Ming Yang
3b736e1c38
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 ( #25049 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-10-09 08:06:29 -07:00
Lukas Geiger
2c1c7dfb35
[Models][Qwen] Replace pad with cat for better performance ( #26486 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-10-09 14:51:26 +00:00
Harry Mellor
e246ad6f0c
Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 ( #26481 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-09 06:02:40 -07:00
Jiangyun Zhu
5728da11ea
Revert #26113 "[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" ( #26472 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-09 05:43:55 -07:00
Simon Danielsson
92be3f3517
[Feature] Use pydantic validation in parallel.py config ( #26417 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-09 12:41:31 +00:00
Isotr0py
d1ddf340c8
[V0 deprecation] Remove QKVCrossParallelLinear implementation ( #26475 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-09 10:52:27 +00:00
Wenzheng Bi
ec10fd0abc
[Bugfix] Move current_platform import to avoid python import cache. ( #16601 )
...
Signed-off-by: iwzbi <wzbi@zju.edu.cn>
2025-10-09 10:46:19 +00:00
Lukas Geiger
0426e3c5e1
[Models][Qwen3VL] Optimise _validate_and_reshape_mm_tensor ( #26426 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-10-09 10:25:48 +00:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Cyrus Leung
dc7976dd9f
[Misc] Upgrade more code to Python 3.10 ( #26463 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 10:43:53 +01:00
Simon Danielsson
e4791438ed
[Feature] Use pydantic validation in lora.py and load.py configs ( #26413 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
2025-10-09 02:38:33 -07:00
youkaichao
e6e898f95d
[doc] add Volcengine as a compute sponsor ( #26477 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-10-09 17:11:47 +08:00
Nick Hill
ddcbc2f334
[Misc] Misc code simplifications ( #26450 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-09 02:10:06 -07:00
Jerry Zhang
a83ff278d6
[torchao] Add support for ModuleFqnToConfig using regex ( #26001 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-10-09 08:32:32 +00:00
Rahul Tuli
cf4cd6c24f
Add: Support for multiple hidden layers in Eagle3 ( #26164 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-09 07:30:50 +00:00
Harry Mellor
b960441812
Enable RMSNorm substitution for Transformers backend ( #26353 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-09 07:28:51 +00:00
Luciano Martins
1317028aa8
[Model] Gemma3: Fix GGUF loading and quantization ( #26189 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-09 07:00:53 +00:00
elvischenv
5e49c3e777
Bump Flashinfer to v0.4.0 ( #26326 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-10-08 23:58:44 -07:00
pwschuurman
0d7c3cb51d
Update Dockerfile and install runai-model-streamer[gcs] package ( #26464 )
...
Signed-off-by: Peter Schuurman <psch@google.com>
2025-10-08 23:48:51 -07:00
Jee Jee Li
1b2c440cd6
[Core] Relax the LoRA max rank ( #26461 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-08 23:47:14 -07:00
Cyrus Leung
0f29dca988
[CI/Build] Fix model nightly tests ( #26466 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-08 23:44:16 -07:00