2569 Commits

Author SHA1 Message Date
Robert Shaw
29d1ffc5b4
[DP] Fix Prometheus Logging (#21257)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-07-21 09:11:35 -07:00
Ming Yang
6ece16c4fe
[Misc] Add dummy maverick test (#21199)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-21 09:08:09 -07:00
simpx
a0e827e07c
[BugFix] make utils.current_stream thread-safety (#21252) (#21253)
Signed-off-by: simpx <simpxx@gmail.com>
2025-07-21 09:07:36 -07:00
Woosuk Kwon
6dda13c86b
[Misc] Add sliding window to flashinfer test (#21282)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-21 08:37:49 -07:00
Zhiyu
6b46c4b653
Add Nvidia ModelOpt config adaptation (#19815)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-07-21 10:02:58 -04:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Cyrus Leung
042af0c8d3
[Model][1/N] Support multiple poolers at model level (#21227)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-21 02:22:21 -07:00
Jiayi Yan
7ba34b1241
[bugfix] fix syntax warning caused by backslash (#21251) 2025-07-20 17:12:10 +00:00
Raushan Turganbay
9499e26e2a
[Model] Support VLMs with transformers backend (#20543)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-20 13:25:50 +00:00
Seiji Eicher
d1fb65bde3
Enable v1 metrics tests (#20953)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-20 03:22:02 +00:00
Chengji Yao
3a1d8940ae
[TPU] support fp8 kv cache quantization (#19292)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-20 03:01:00 +00:00
Yuxuan Zhang
10eb24cc91
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
2025-07-19 22:40:31 +00:00
Woosuk Kwon
752c6ade2e
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-19 13:53:17 -07:00
Thomas Parnell
881e3cbe3b
[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-19 19:27:21 +00:00
kourosh hakhamaneshi
9f414a12ad
[BugFix] Make PD work with Ray (#21072)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-19 08:46:50 -07:00
Rabi Mishra
c81259d33a
Fix/remove some broken model executor tests (#21224)
Signed-off-by: Rabi Mishra <ramishra@redhat.com>
2025-07-19 12:15:07 +00:00
22quinn
b3d82108e7
[Bugfix][Frontend] Fix openai CLI arg middleware (#21220)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-19 02:40:38 -07:00
shixianc
7d94577138
Add torch golden impl for moe_align_block_size kernel test (#20653)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-19 02:32:36 -07:00
Isotr0py
18e519ec86
[Bugfix] Fix ndarray video color from VideoAsset (#21064)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-19 02:17:16 -07:00
Jee Jee Li
1eaff27815
[V0 deprecation] Remove long context LoRA (#21169)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-19 02:15:41 -07:00
Huy Do
cf8cc32674
Fix a couple of Voxtral tests (#21218)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-07-19 09:13:41 +00:00
김종곤
3e04107d97
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
2025-07-19 14:25:44 +08:00
Woosuk Kwon
dd572c0ab3
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-18 21:47:50 -07:00
Lucia Fang
9a9fda1423
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
2025-07-18 20:48:38 -07:00
JialinOuyang-Meta
0f199f197b
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
2025-07-18 12:34:40 -07:00
Cyrus Leung
45badd05d0
[Core] Set pooling params based on task and model (#21128)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-18 05:41:17 -07:00
wang.yuqi
5895afd780
[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-18 09:10:47 +00:00
wang.yuqi
ca4eb82bcb
[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-18 07:15:07 +00:00
shixianc
5780121c95
[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-18 04:34:43 +00:00
Cyrus Leung
90bd2ab6e3
[Model] Update pooling model interface (#21058)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-17 16:05:40 +00:00
ElizaWszola
9fb2d22032
[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-07-17 09:56:44 -04:00
kYLe
4ef00b5cac
[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-17 03:07:55 -07:00
Asher
5a7fb3ab9e
[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-17 09:10:09 +00:00
Varun Sundar Rabindranath
11dfdf21bf
[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-17 08:10:37 +00:00
Chauncey
fdc5b43d20
[Bugfix]: Fix final_res_batch list index out of range error (#21055)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-17 00:29:09 -07:00
David Ben-David
4fcef49ec4
[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-07-17 13:29:45 +08:00
Lucas Wilkinson
76b494444f
[Attention] Refactor attention metadata builder interface (#20466)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-17 04:44:25 +00:00
Michael Goin
4e7dfbe7b4
Update PyTorch to torch==2.7.1 for CUDA (#21011)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
Mac Misiura
18bdcf4113
feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information (#20575)
Signed-off-by: m-misiura <mmisiura@redhat.com>
2025-07-16 21:52:14 +08:00
Seiji Eicher
d0dc4cfca4
Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests (#20831)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-16 00:14:49 -07:00
zhiweiz
c11013db8b
[Meta] Llama4 EAGLE Support (#20591)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: qizixi <qizixi@meta.com>
2025-07-15 21:14:15 -07:00
Peter Pan
1eb2b9c102
[CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
Maximilien de Bayser
6ebf313790
Avoid direct comparison of floating point numbers (#21002)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-07-15 21:12:14 -07:00
Patrick von Platen
cfbcb9ed87
[Voxtral] Add more tests (#21010)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 21:11:49 -07:00
Chauncey
34cda778a0
[Frontend] OpenAI Responses API supports input image (#20975)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 18:59:36 -06:00
Harry Mellor
1e36c8687e
[Deprecation] Remove nullable_kvs (#20969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:21:50 +00:00
Patrick von Platen
e7e3e6d263
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
Harry Mellor
56fe4bedd6
[Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 14:00:50 +00:00
Thomas Parnell
3534c39a20
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
Ilya Markov
37a7d5d74a
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-15 06:57:40 +00:00