Nick Hill
|
9907fc4494
|
[Docs] Data Parallel deployment documentation (#20768)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-11 09:42:10 -07:00 |
|
Michael Goin
|
d47661f0cd
|
[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 10:05:33 -06:00 |
|
Varun Sundar Rabindranath
|
53fa457391
|
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-11 07:51:46 -07:00 |
|
Reid
|
6fb162447b
|
[doc] fix ordered list issue (#20819)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-11 06:49:46 -07:00 |
|
Li, Jiang
|
66177189c5
|
[Bugfix] Add missing field to TritonLanguagePlaceholder (#20812)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-11 05:25:11 -07:00 |
|
QiliangCui
|
b4f0b5f9aa
|
Temporarily suspend google/gemma-3-1b-it. (#20722)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-11 11:21:26 +00:00 |
|
Cyrus Leung
|
cbd14ed561
|
[Bugfix] Refactor /invocations to be task-agnostic (#20764)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-11 03:20:54 -07:00 |
|
Pavani Majety
|
7bd4c37ae7
|
[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: shuw <shuw@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 09:23:23 +00:00 |
|
Jee Jee Li
|
8020e98c9f
|
[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-11 08:01:13 +00:00 |
|
Luka Govedič
|
762be26a8e
|
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
|
2025-07-11 00:15:22 -07:00 |
|
Reid
|
6a9e6b2abf
|
[doc] fold long code block (#20795)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-10 23:16:41 -07:00 |
|
nopperl
|
5d09152ff1
|
[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-07-11 05:53:31 +00:00 |
|
Luka Govedič
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
Ratnam Parikh
|
35514b682a
|
[XPU] XCCL support enabled in torch 2.8.0.dev nightly builds (#20705)
Signed-off-by: ratnampa <ratnam.parikh@intel.com>
|
2025-07-10 20:39:52 -07:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Alexander Matveev
|
5b032352cc
|
[Attention] MLA - Flashinfer Ragged Prefill (#20034)
|
2025-07-10 20:17:47 -07:00 |
|
Michael Goin
|
922f316441
|
[Model] Support HF format of minimax (#20211)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 02:55:21 +00:00 |
|
Duncan Moss
|
5923ab9524
|
[fix]: disable cutlass block scaled group gemm for EP (#20781)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-11 02:39:18 +00:00 |
|
bigmoyan
|
0cf893cae1
|
Add kimi-k2 tool parser (#20789)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-07-11 10:36:23 +08:00 |
|
Michael Goin
|
cf75cd2098
|
[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install (#20772)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 01:16:01 +00:00 |
|
Simon Mo
|
b854321ffe
|
[Docs] Lazy import gguf (#20785)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-10 16:06:37 -07:00 |
|
Kuntai Du
|
5b6fe23d05
|
[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-10 14:52:46 -07:00 |
|
Varun Sundar Rabindranath
|
f0c98cae27
|
[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 14:40:38 -07:00 |
|
Nick Hill
|
574ad60db9
|
[KVConnector] Always call connector clear_metadata() at end of step (#20756)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: David Ben-David <sdavidbd@gmail.com>
|
2025-07-10 22:37:27 +01:00 |
|
Varun Sundar Rabindranath
|
fdadb6f43a
|
[Bugfix] Fused MoE Modular Kernel chunking loop (#20392)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 20:31:10 +00:00 |
|
Alex Brooks
|
41060c6e08
|
[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-07-10 21:09:37 +01:00 |
|
Ming Yang
|
3de2ed767f
|
[Bugfix] Remove assertion of expert_map being None (#20714)
Signed-off-by: Ming Yang <yming@meta.com>
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-10 19:55:22 +00:00 |
|
Wentao Ye
|
299252ea82
|
[CI] Fix pre commit issue (#20782)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-10 12:48:13 -07:00 |
|
Nathan Hoos
|
d6902ce79f
|
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
|
2025-07-10 15:30:26 -04:00 |
|
Sanger Steel
|
5e53c89a74
|
[Bugfix] [CI] Fix Tensorizer LoRA test (#20760)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-10 19:07:06 +00:00 |
|
QiliangCui
|
c66e38ea4c
|
[Test] Remove docker build from test. (#20542)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-10 11:21:58 -07:00 |
|
sfbemerk
|
251595368f
|
Fix DeepSeek-R1-0528 chat template (#20717)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2025-07-10 17:47:36 +00:00 |
|
shineran96
|
4bed167768
|
[Model][VLM] Support JinaVL Reranker (#20260)
Signed-off-by: shineran96 <shinewang96@gmail.com>
|
2025-07-10 10:43:43 -07:00 |
|
Asher
|
b140416abf
|
[Model] Add reason parser for Hunyuan A13B Model. (#20625)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-10 16:33:26 +00:00 |
|
Gregory Shtrasberg
|
5b8366b61a
|
[ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-10 09:22:23 -07:00 |
|
nishith-fujitsu
|
c7753a9809
|
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129)
Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>
|
2025-07-10 15:59:04 +00:00 |
|
Michael Goin
|
4b9a9435bb
|
Update Dockerfile FlashInfer to v0.2.8rc1 (#20718)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-10 08:09:02 -07:00 |
|
Harry Mellor
|
3482fd7e4e
|
[Doc] Add engine args back in to the docs (#20674)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-10 08:02:40 -07:00 |
|
Isotr0py
|
77f77a951e
|
[Misc] Clean up mark to fork process in BNB tests (#20692)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-10 13:59:40 +00:00 |
|
Michael Goin
|
1a4f35e2ea
|
Normalize lm-eval command between baseline and correctness test (#18560)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-10 13:27:32 +00:00 |
|
Michael Goin
|
be1e128dfb
|
[CI Bugfix] Skip failing Tensorizer+LoRA test (#20724)
|
2025-07-10 21:15:03 +09:00 |
|
Reid
|
65393ee064
|
[doc] fix ordered list (#20749)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-10 03:13:52 -07:00 |
|
Gregory Shtrasberg
|
dc221ad72d
|
[Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined (#20738)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-10 02:58:11 -07:00 |
|
Jee Jee Li
|
7571a4a7e5
|
[CI/Build] Fix Basic Models Test (#20728)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-10 09:57:19 +00:00 |
|
Isotr0py
|
f67d986dd1
|
[Misc] loose new-model tagger conditions (#20747)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-10 02:54:47 -07:00 |
|
Or Ozeri
|
cc876d0f29
|
[KVConnector] Aggregate finished requests on the scheduler (#19555)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-07-10 09:22:18 +01:00 |
|
Chenyaaang
|
fdfd409f8f
|
[TPU][Core]Make load weight exceed hbm error more instructive for customers (#20644)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-07-10 07:01:17 +00:00 |
|
Nick Hill
|
ffbcc9e757
|
[BugFix] Fix VllmConfig() construction on all platforms (#20695)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-10 07:00:20 +00:00 |
|
Nick Hill
|
59389c927b
|
[BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-10 14:24:20 +08:00 |
|
Chauncey
|
8f2720def9
|
[Frontend] Support Tool Calling with both tool_choice='required' and $defs. (#20629)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-10 13:56:35 +08:00 |
|