Thomas Parnell
8615d9776f
[CI/Build] Add new CI job to validate Hybrid Models for every PR ( #20147 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-06-27 23:00:25 -07:00
Jiayi Yan
7b460c25f9
[BugFix] Fix the incorrect func name in the comments. (config.py) ( #20185 )
2025-06-27 22:51:16 -07:00
Michael Goin
f719772281
[Bugfix] Properly reject requests with empty list guided_choice ( #20195 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 22:50:52 -07:00
Wentao Ye
d45417b804
fix ci issue distributed 4 gpu test ( #20204 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-27 22:50:00 -07:00
Michael Goin
a29e62ea34
Fix num_token_padding support for static per-tensor scaled_fp8_quant ( #20188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 22:48:13 -07:00
Chales Xu
e53be6f00a
[Misc] Add type assertion of request_id for LLMEngine.add_request ( #19700 )
...
Signed-off-by: n2ptr <xuzhanchaomail@163.com>
2025-06-27 22:47:36 -07:00
Michael Goin
c329ceca6d
[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes ( #20199 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-28 13:43:06 +08:00
Fabien Dupont
3c545c0c3b
[CI/Build] Allow hermetic builds ( #18064 )
...
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Fabien Dupont <fabiendupont@pm.me>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Elias Levy <eliaslevy@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-06-27 09:04:39 -07:00
Tyler Michael Smith
e8c3bd2cd1
[Bugfix] Fix some narrowing conversion warnings ( #20141 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-27 09:01:28 -07:00
bnellnm
c6c983053d
[Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. ( #20152 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-06-27 09:42:22 -06:00
Luka Govedič
aafabaa0d5
[Fix][torch.compile] Enable custom ops by default when Inductor off ( #20102 )
...
Signed-off-by: luka <luka@neuralmagic.com>
2025-06-27 09:00:42 -06:00
Hosang
94a55c7681
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 ( #19891 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-06-27 07:14:44 -07:00
Ilya Lavrenov
aa0dc77ef5
[Perf] Improved perf for resolve_chat_template_content_format ( #20065 )
...
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@cerebras.net>
2025-06-27 09:16:41 +00:00
Michael Goin
4ab3ac285e
[Bugfix] Fix flaky failure when getting DP ports ( #20151 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 15:30:53 +08:00
Robert Shaw
d1c956dc0f
Gemma3n (Text-only) ( #20134 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-06-27 07:16:26 +00:00
Chendi.Xue
dec197e3e5
Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn ( #20143 )
...
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2025-06-27 05:48:13 +00:00
Yazan Sharaya
6e244ae091
[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead ( #19946 )
...
Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>
2025-06-27 00:44:14 -04:00
wang.yuqi
cd4cfee689
[Model][1/N] Automatic conversion of CrossEncoding model ( #20012 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-06-26 21:10:04 -07:00
Thomas Parnell
e110930680
[Fix] Fix gemma CI test failing on main ( #20124 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-06-26 21:06:59 -07:00
Yang Wang
8b64c895c0
[CI] Sync test dependency with test.in for torch nightly ( #19632 )
...
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-26 20:55:25 -07:00
li haoyang
0740e29b66
[Feature] add quick all reduce ( #19744 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:54:24 -07:00
Michael Goin
44d2e6af63
[Bugfix] Build moe_data for both sm100 and sm90 ( #20086 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-26 20:50:12 -07:00
Ilya Markov
2d7779f888
[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler ( #20071 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:50:09 -07:00
Dipika Sikka
a57d57fa72
[Quantization] Bump to use latest compressed-tensors ( #20033 )
...
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-26 20:50:06 -07:00
Michael Goin
71799fd005
[CI Failure] Fix OOM with test_oot_registration_embedding ( #20144 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 11:21:04 +08:00
Bowen Wang
e9fd658a73
[Feature] Expert Parallelism Load Balancer (EPLB) ( #18343 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-06-26 15:30:21 -07:00
Kyle Yu
07b8fae219
[Doc] correct LoRA capitalization ( #20135 )
...
Signed-off-by: kyolebu <kyu@redhat.com>
2025-06-26 15:22:12 -07:00
Wentao Ye
562308816c
[Refactor] Rename commnication utils ( #20091 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-26 22:19:32 +00:00
Chengji Yao
04e1642e32
[TPU] add kv cache update kernel ( #19928 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-06-26 10:01:37 -07:00
Kunshang Ji
b69781f107
[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. ( #19560 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-06-26 09:27:18 -07:00
Tyler Michael Smith
0bceac9810
Spam folks if config.py changes ( #20131 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-26 08:19:46 -07:00
Cyrus Leung
34878a0b48
[Doc] Rename page titles ( #20130 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-26 08:18:49 -07:00
Cyrus Leung
6393b03986
[Doc] Auto sign-off for VSCode ( #20132 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-26 08:18:36 -07:00
wang.yuqi
0907d507bf
[Doc] Automatically signed-off by PyCharm ( #20120 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-06-26 14:34:17 +00:00
Wentao Ye
c894c5dc1f
[Bug Fix] Fix address/port already in use error for deep_ep test ( #20094 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-26 22:33:13 +08:00
Michael Goin
1f5d178e9c
Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" ( #20128 )
2025-06-26 07:32:22 -07:00
TJian
27c065df50
[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) ( #19904 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-06-26 12:42:31 +00:00
Michael Yao
84c260caeb
[Docs] Improve frameworks/helm.md ( #20113 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-06-26 10:41:51 +00:00
Reid
167aca45cb
[Misc] Use collapsible blocks for benchmark examples. ( #20017 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-26 03:35:16 -07:00
Li, Jiang
0567c8249f
[CPU] Fix torch version in x86 CPU backend ( #19258 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-26 03:34:47 -07:00
Wentao Ye
d188913d99
[Refactor] Remove unused library ( #20099 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-26 09:16:10 +00:00
Cyrus Leung
1d7c29f5fe
[Doc] Update docs for New Model Implementation ( #20115 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-26 00:47:06 -07:00
Seiji Eicher
65397e40f5
[Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id ( #18979 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-06-26 00:01:57 -07:00
Ekagra Ranjan
9502c38138
[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline ( #20083 )
2025-06-25 22:06:27 -07:00
Nicolò Lucchesi
2582683566
[PD] Skip tp_size exchange with rank0 ( #19413 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-25 20:04:39 -07:00
Michael Goin
754b00edb3
[Bugfix] Fix Mistral tool-parser regex for nested JSON ( #20093 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-26 01:01:17 +00:00
Michael Goin
296ce95d8e
[CI] Add SM120 to the Dockerfile ( #19794 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-25 16:23:56 -07:00
Chenyaaang
2d7620c3eb
[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN ( #19919 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-25 15:51:02 -07:00
Nick Hill
55c65ab495
[P/D] Avoid stranding blocks in P when aborted in D's waiting queue ( #19223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-25 15:19:44 -07:00
Chengji Yao
2cc2069970
[TPU][Bugfix] fix kv cache padding ( #20048 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-06-25 21:24:10 +00:00