7483 Commits

Author SHA1 Message Date
Sage Moore
0056be26f6 less ARs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:33:53 +00:00
Sage Moore
7cc5a549ad cleanup some of the should_ubatch logic
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:22:53 +00:00
Sage Moore
83caef8bac cleanups for ubatching.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:50:19 +00:00
Sage Moore
2f3461ad23 cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:52 +00:00
Sage Moore
7e2ff2620e cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:07 +00:00
Sage Moore
1d75a029a9 remove cudagraph logic from flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:41:49 +00:00
Sage Moore
17a7ceef27 cleanup deepep ll
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:35:21 +00:00
Sage Moore
6e2a3c0841 minor changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:29:32 +00:00
Sage Moore
631be12edb refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:16:34 +00:00
Sage Moore
a9d47e8652 remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:09:33 +00:00
Sage Moore
fc562e22e2 cleanup gpu_worker.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:59 +00:00
Sage Moore
1ca65412b8 cleanup backends/utils.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:33 +00:00
Sage Moore
3112714bdc cleanup logger.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:05:38 +00:00
Sage Moore
0c03d154b5 cleanup config.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:03:26 +00:00
Sage Moore
9b7edc0343 cleanup data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:02:12 +00:00
Sage Moore
be2e1632fd delete basic-ub.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:01:01 +00:00
Sage Moore
ce3ef95c11 turn yields on for pplx
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:34:02 +00:00
Sage Moore
18f7bfb501 ubatching fix
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:22:41 +00:00
Sage Moore
3d833aa759 cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:20:21 +00:00
Sage Moore
0e499c4f4d first round of cleanups
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
Sage Moore
0767d9863f fix data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 19:25:59 +00:00
Sage Moore
c0efbbb5de misc changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
Lucas Wilkinson
f7a3ee0ea1 Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
Sage Moore
57d404bbb8 misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:37:58 +00:00
fyuan1316
e28533a16f
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
2025-07-01 01:30:14 +00:00
Luka Govedič
6d42ce8315
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
2025-07-01 01:03:13 +00:00
Zhonghua Deng
ded1fb635b
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-30 16:45:14 -07:00
Wentao Ye
97d9524fe9
[Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-30 18:15:24 +00:00
Kyle Sayers
d8cf819a9a
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-30 17:26:49 +00:00
Sage Moore
d833982e48 random push
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-30 17:08:51 +00:00
Wentao Ye
551ef1631a
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-30 10:26:42 -06:00
Woosuk Kwon
2863befce3
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Woosuk Kwon
2965c99c86
[Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:28:13 -07:00
Woosuk Kwon
2062c0723d
[Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:13:50 -07:00
li haoyang
1c50e100a9
[Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
2025-06-30 22:24:50 +09:00
Michael Yao
3ee56e26be
[Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-06-30 11:20:51 +00:00
Jee Jee Li
8fe7fc8634
[Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-30 18:22:09 +08:00
Isotr0py
e936e401de
[Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-30 10:16:16 +00:00
noiji
f5dfa07531
[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598)
Signed-off-by: noiji <>
2025-06-30 18:21:56 +09:00
Reid
022c58b80f
[doc] Add Slack and Forum to the top navigation (#20208)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-06-30 07:53:45 +00:00
Woosuk Kwon
19108ef311
[Misc] Fix import (#20233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-29 20:34:54 -07:00
Chendi.Xue
5a52f389dd
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-06-29 19:46:19 -07:00
redmoe-moutain
65b1cbb138
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
2025-06-29 19:34:36 -07:00
Huy Do
6c9837a761
Fix cuda_archs_loose_intersection when handling sm_*a (#20207)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-06-29 16:52:34 -07:00
Dipika Sikka
6f2f53a82d
[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2025-06-29 22:05:40 +00:00
Michael Goin
7b1895e6ce
[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-29 10:31:37 +08:00
Wentao Ye
4d36693687
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-28 22:06:38 +00:00
Sage Moore
4672c72f44 capture works replay does not
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-28 19:14:48 +00:00
Stan Wozniak
daec9dea6e
[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-06-28 08:16:41 -07:00
Nicolò Lucchesi
daceac57c7
[Frontend] Generalize v1/audio/transcriptions endpoint (#20179)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-28 08:15:26 -07:00