716 Commits

Author SHA1 Message Date
Fanli Lin
0cd72a7b72
[XPU] add missing dependency tblib for XPU CI (#24639)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-09-11 11:22:33 +00:00
wang.yuqi
a8b0361c92
[CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 01:53:09 -07:00
Xingyu Liu
0fc36463e0
[CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test (#24615)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
2025-09-11 01:52:10 -07:00
Jerry Zhang
2048c4e379
[torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-09-10 23:53:24 -07:00
Hanjie Qiu
dcb28a332b
[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-10 15:31:10 -07:00
Russell Bryant
37e8182bfe
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
2025-09-10 13:53:35 -07:00
wang.yuqi
bd98842c8a
[CI] Add PPL test for generation models (#24485)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-10 06:16:39 -07:00
Ye (Charlotte) Qi
492196ed0e
[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-10 06:16:07 -07:00
Jiangyun Zhu
b8a93076d3
[CI] execute all piecewise compilation tests together (#24502)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-09 11:05:25 -07:00
Sahithi Chigurupati
6910b56da2
[CI] Add nightly multiarch manifests to dockerhub (#24102)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-09 01:18:09 +00:00
Woosuk Kwon
4172235ab7
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
yzds
ac201a0eaf
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
Rafael Vasquez
c954c6629c
[CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-09-05 17:26:22 -07:00
Louie Tsai
006e7a34ae
Adding int4 and int8 models for CPU benchmarking (#23709)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-09-05 20:08:50 +08:00
elvischenv
adc3ddb430
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
Kunshang Ji
16ded21eeb
[XPU] support Triton Attention backend on Intel GPU (#24149)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-04 20:41:08 +08:00
Lucas Wilkinson
402759d472
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
Didier Durand
02d411fdb2
[Doc]: fix typos in Python comments (#24115)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:14:07 -07:00
youkaichao
42dc59dbac
Update release pipeline post PyTorch 2.8.0 update (#24073)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-03 10:09:19 +08:00
Matthew Bonanni
2fd1a40a54
[CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-02 16:50:28 -07:00
Christian Pinto
1cb39dbcdd
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
Isotr0py
ff0e59d83a
[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-31 22:52:20 -07:00
Huy Do
67c14906aa
Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-29 18:57:35 +08:00
Li, Jiang
ad39106b16
[CPU] Enable data parallel for CPU backend (#23903)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-29 02:19:58 -07:00
Jee Jee Li
b4f9e9631c
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
Simon Mo
27e88cee74
chore: build release image by default (#23852)
Signed-off-by: Codex <codex@openai.com>
2025-08-28 13:17:15 -07:00
elvischenv
16a45b3a28
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
Jean Schmidt
0583578f42
[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
2025-08-28 08:59:19 -07:00
Li, Jiang
67cee40da0
[CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-28 11:57:05 +00:00
Alex
f48a9af892
[CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
2025-08-27 21:27:36 -07:00
Eli Uriegas
3c0ef769ba
ci: Add arm64 docker build to release pipeline (#23210)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
2025-08-27 10:41:48 -07:00
Kunshang Ji
fce10dbed5
[XPU] Add xpu torch.compile support (#22609)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-27 05:33:27 +00:00
Didier Durand
7c04779afa
[Doc]: fix various spelling issues in multiple files (#23636)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-26 14:05:29 +00:00
nvjullin
f66673a39d
[Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-26 06:54:04 -07:00
Michael Goin
906e461ed6
[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-25 18:29:00 -07:00
Pate Motter
c34c82b7fe
[TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. (#23574)
Signed-off-by: Pate Motter <patemotter@google.com>
2025-08-25 14:29:16 -07:00
Didier Durand
47455c424f
[Doc: ]fix various typos in multiple files (#23487)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-25 00:04:04 +00:00
Zhewen Li
0483fabc74
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-08-22 13:34:40 -07:00
Naman Lalit
ebe14621e3
[Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script (#23375)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
2025-08-22 15:12:28 +00:00
Cyrus Leung
8896eb72eb
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-22 10:56:57 +08:00
22quinn
480bdf5a7b
[Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-22 09:40:54 +08:00
Lain
f8ce022948
add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 17:05:47 +00:00
youkaichao
e0b056e443
[ci/build] Fix abi tag for aarch64 (#23329)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-21 23:32:55 +08:00
Michael Goin
f64ee61d9e
[CI] Block the cu126 wheel build while broken (#23285)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-21 04:21:05 +00:00
QiliangCui
8993073dc1
[CI] Delete images older than 24h. (#23291)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-08-20 21:15:20 -07:00
Cyrus Leung
2461d9e562
[CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 20:05:20 -07:00
Li, Jiang
7be5d113d8
[CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-21 09:34:24 +08:00
youkaichao
1b125004be
[misc] fix multiple arch wheels for the nightly index (#23110)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-20 14:15:34 -07:00
Michael Goin
0cdbf5e61c
[Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 15:13:36 -04:00
Yong Hoon Shin
dfd2382039
[torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-20 16:52:59 +00:00