7495 Commits

Author SHA1 Message Date
Sage Moore
bfa828f399 format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 17:13:49 +00:00
Sage Moore
dc1b6af362 format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 16:45:11 +00:00
Sage Moore
716b03277e should_ubatch improvements
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 13:07:57 +00:00
Sage Moore
1a0e7110dd _prepare_inputs cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-08 13:02:21 +00:00
Sage Moore
82ae694de6 comments cleanup etc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 20:47:39 +00:00
Sage Moore
10ca263058 split some of the ubatching logic out of _run_model
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 20:26:56 +00:00
Sage Moore
908e9f8f54 cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 19:52:41 +00:00
Sage Moore
06cc133a63 cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:51:08 +00:00
Sage Moore
3a41a3dcff cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:23:30 +00:00
Sage Moore
bb0645c644 separate ubatch and normal runs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 17:07:58 +00:00
Sage Moore
510e839429 more cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 16:35:52 +00:00
Sage Moore
f7b6e600b8 gpu_model_runner cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 16:23:11 +00:00
Sage Moore
0056be26f6 less ARs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:33:53 +00:00
Sage Moore
7cc5a549ad cleanup some of the should_ubatch logic
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:22:53 +00:00
Sage Moore
83caef8bac cleanups for ubatching.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:50:19 +00:00
Sage Moore
2f3461ad23 cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:52 +00:00
Sage Moore
7e2ff2620e cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:07 +00:00
Sage Moore
1d75a029a9 remove cudagraph logic from flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:41:49 +00:00
Sage Moore
17a7ceef27 cleanup deepep ll
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:35:21 +00:00
Sage Moore
6e2a3c0841 minor changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:29:32 +00:00
Sage Moore
631be12edb refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:16:34 +00:00
Sage Moore
a9d47e8652 remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:09:33 +00:00
Sage Moore
fc562e22e2 cleanup gpu_worker.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:59 +00:00
Sage Moore
1ca65412b8 cleanup backends/utils.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:33 +00:00
Sage Moore
3112714bdc cleanup logger.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:05:38 +00:00
Sage Moore
0c03d154b5 cleanup config.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:03:26 +00:00
Sage Moore
9b7edc0343 cleanup data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:02:12 +00:00
Sage Moore
be2e1632fd delete basic-ub.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:01:01 +00:00
Sage Moore
ce3ef95c11 turn yields on for pplx
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:34:02 +00:00
Sage Moore
18f7bfb501 ubatching fix
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:22:41 +00:00
Sage Moore
3d833aa759 cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:20:21 +00:00
Sage Moore
0e499c4f4d first round of cleanups
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
Sage Moore
0767d9863f fix data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 19:25:59 +00:00
Sage Moore
c0efbbb5de misc changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
Lucas Wilkinson
f7a3ee0ea1 Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
Sage Moore
57d404bbb8 misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:37:58 +00:00
fyuan1316
e28533a16f
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
2025-07-01 01:30:14 +00:00
Luka Govedič
6d42ce8315
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
2025-07-01 01:03:13 +00:00
Zhonghua Deng
ded1fb635b
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-30 16:45:14 -07:00
Wentao Ye
97d9524fe9
[Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-30 18:15:24 +00:00
Kyle Sayers
d8cf819a9a
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-30 17:26:49 +00:00
Sage Moore
d833982e48 random push
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-30 17:08:51 +00:00
Wentao Ye
551ef1631a
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-30 10:26:42 -06:00
Woosuk Kwon
2863befce3
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Woosuk Kwon
2965c99c86
[Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:28:13 -07:00
Woosuk Kwon
2062c0723d
[Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:13:50 -07:00
li haoyang
1c50e100a9
[Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
2025-06-30 22:24:50 +09:00
Michael Yao
3ee56e26be
[Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-06-30 11:20:51 +00:00
Jee Jee Li
8fe7fc8634
[Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-30 18:22:09 +08:00
Isotr0py
e936e401de
[Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-30 10:16:16 +00:00