Sage Moore
|
908e9f8f54
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 19:52:41 +00:00 |
|
Sage Moore
|
06cc133a63
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:51:08 +00:00 |
|
Sage Moore
|
3a41a3dcff
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:23:30 +00:00 |
|
Sage Moore
|
bb0645c644
|
separate ubatch and normal runs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:07:58 +00:00 |
|
Sage Moore
|
510e839429
|
more cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 16:35:52 +00:00 |
|
Sage Moore
|
f7b6e600b8
|
gpu_model_runner cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 16:23:11 +00:00 |
|
Sage Moore
|
0056be26f6
|
less ARs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 14:33:53 +00:00 |
|
Sage Moore
|
7cc5a549ad
|
cleanup some of the should_ubatch logic
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 14:22:53 +00:00 |
|
Sage Moore
|
83caef8bac
|
cleanups for ubatching.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:50:19 +00:00 |
|
Sage Moore
|
2f3461ad23
|
cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:45:52 +00:00 |
|
Sage Moore
|
7e2ff2620e
|
cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:45:07 +00:00 |
|
Sage Moore
|
1d75a029a9
|
remove cudagraph logic from flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:41:49 +00:00 |
|
Sage Moore
|
17a7ceef27
|
cleanup deepep ll
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:35:21 +00:00 |
|
Sage Moore
|
6e2a3c0841
|
minor changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:29:32 +00:00 |
|
Sage Moore
|
631be12edb
|
refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:16:34 +00:00 |
|
Sage Moore
|
a9d47e8652
|
remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:09:33 +00:00 |
|
Sage Moore
|
fc562e22e2
|
cleanup gpu_worker.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:07:59 +00:00 |
|
Sage Moore
|
1ca65412b8
|
cleanup backends/utils.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:07:33 +00:00 |
|
Sage Moore
|
3112714bdc
|
cleanup logger.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:05:38 +00:00 |
|
Sage Moore
|
0c03d154b5
|
cleanup config.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:03:26 +00:00 |
|
Sage Moore
|
9b7edc0343
|
cleanup data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:02:12 +00:00 |
|
Sage Moore
|
be2e1632fd
|
delete basic-ub.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:01:01 +00:00 |
|
Sage Moore
|
ce3ef95c11
|
turn yields on for pplx
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 22:34:02 +00:00 |
|
Sage Moore
|
18f7bfb501
|
ubatching fix
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 22:22:41 +00:00 |
|
Sage Moore
|
3d833aa759
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 21:20:21 +00:00 |
|
Sage Moore
|
0e499c4f4d
|
first round of cleanups
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 21:11:28 +00:00 |
|
Sage Moore
|
0767d9863f
|
fix data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 19:25:59 +00:00 |
|
Sage Moore
|
c0efbbb5de
|
misc changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 16:56:30 +00:00 |
|
Lucas Wilkinson
|
f7a3ee0ea1
|
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-02 16:52:19 +00:00 |
|
Sage Moore
|
57d404bbb8
|
misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 16:37:58 +00:00 |
|
fyuan1316
|
e28533a16f
|
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
|
2025-07-01 01:30:14 +00:00 |
|
Luka Govedič
|
6d42ce8315
|
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-07-01 01:03:13 +00:00 |
|
Zhonghua Deng
|
ded1fb635b
|
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-30 16:45:14 -07:00 |
|
Wentao Ye
|
97d9524fe9
|
[Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-30 18:15:24 +00:00 |
|
Kyle Sayers
|
d8cf819a9a
|
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-30 17:26:49 +00:00 |
|
Sage Moore
|
d833982e48
|
random push
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-30 17:08:51 +00:00 |
|
Wentao Ye
|
551ef1631a
|
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-30 10:26:42 -06:00 |
|
Woosuk Kwon
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
Woosuk Kwon
|
2965c99c86
|
[Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:28:13 -07:00 |
|
Woosuk Kwon
|
2062c0723d
|
[Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:13:50 -07:00 |
|
li haoyang
|
1c50e100a9
|
[Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
|
2025-06-30 22:24:50 +09:00 |
|
Michael Yao
|
3ee56e26be
|
[Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-06-30 11:20:51 +00:00 |
|
Jee Jee Li
|
8fe7fc8634
|
[Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-30 18:22:09 +08:00 |
|
Isotr0py
|
e936e401de
|
[Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-30 10:16:16 +00:00 |
|
noiji
|
f5dfa07531
|
[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598)
Signed-off-by: noiji <>
|
2025-06-30 18:21:56 +09:00 |
|
Reid
|
022c58b80f
|
[doc] Add Slack and Forum to the top navigation (#20208)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-06-30 07:53:45 +00:00 |
|
Woosuk Kwon
|
19108ef311
|
[Misc] Fix import (#20233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-29 20:34:54 -07:00 |
|
Chendi.Xue
|
5a52f389dd
|
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-06-29 19:46:19 -07:00 |
|
redmoe-moutain
|
65b1cbb138
|
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
|
2025-06-29 19:34:36 -07:00 |
|
Huy Do
|
6c9837a761
|
Fix cuda_archs_loose_intersection when handling sm_*a (#20207)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-06-29 16:52:34 -07:00 |
|