Sage Moore
|
7e2ff2620e
|
cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:45:07 +00:00 |
|
Sage Moore
|
1d75a029a9
|
remove cudagraph logic from flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:41:49 +00:00 |
|
Sage Moore
|
17a7ceef27
|
cleanup deepep ll
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:35:21 +00:00 |
|
Sage Moore
|
6e2a3c0841
|
minor changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:29:32 +00:00 |
|
Sage Moore
|
631be12edb
|
refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:16:34 +00:00 |
|
Sage Moore
|
a9d47e8652
|
remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:09:33 +00:00 |
|
Sage Moore
|
fc562e22e2
|
cleanup gpu_worker.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:07:59 +00:00 |
|
Sage Moore
|
1ca65412b8
|
cleanup backends/utils.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:07:33 +00:00 |
|
Sage Moore
|
3112714bdc
|
cleanup logger.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:05:38 +00:00 |
|
Sage Moore
|
0c03d154b5
|
cleanup config.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:03:26 +00:00 |
|
Sage Moore
|
9b7edc0343
|
cleanup data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:02:12 +00:00 |
|
Sage Moore
|
be2e1632fd
|
delete basic-ub.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:01:01 +00:00 |
|
Sage Moore
|
ce3ef95c11
|
turn yields on for pplx
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 22:34:02 +00:00 |
|
Sage Moore
|
18f7bfb501
|
ubatching fix
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 22:22:41 +00:00 |
|
Sage Moore
|
3d833aa759
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 21:20:21 +00:00 |
|
Sage Moore
|
0e499c4f4d
|
first round of cleanups
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 21:11:28 +00:00 |
|
Sage Moore
|
0767d9863f
|
fix data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 19:25:59 +00:00 |
|
Sage Moore
|
c0efbbb5de
|
misc changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 16:56:30 +00:00 |
|
Lucas Wilkinson
|
f7a3ee0ea1
|
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-02 16:52:19 +00:00 |
|
Sage Moore
|
57d404bbb8
|
misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 16:37:58 +00:00 |
|
fyuan1316
|
e28533a16f
|
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
|
2025-07-01 01:30:14 +00:00 |
|
Luka Govedič
|
6d42ce8315
|
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-07-01 01:03:13 +00:00 |
|
Zhonghua Deng
|
ded1fb635b
|
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-30 16:45:14 -07:00 |
|
Wentao Ye
|
97d9524fe9
|
[Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-30 18:15:24 +00:00 |
|
Kyle Sayers
|
d8cf819a9a
|
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-30 17:26:49 +00:00 |
|
Sage Moore
|
d833982e48
|
random push
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-30 17:08:51 +00:00 |
|
Wentao Ye
|
551ef1631a
|
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-30 10:26:42 -06:00 |
|
Woosuk Kwon
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
Woosuk Kwon
|
2965c99c86
|
[Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:28:13 -07:00 |
|
Woosuk Kwon
|
2062c0723d
|
[Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:13:50 -07:00 |
|
li haoyang
|
1c50e100a9
|
[Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
|
2025-06-30 22:24:50 +09:00 |
|
Michael Yao
|
3ee56e26be
|
[Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-06-30 11:20:51 +00:00 |
|
Jee Jee Li
|
8fe7fc8634
|
[Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-30 18:22:09 +08:00 |
|
Isotr0py
|
e936e401de
|
[Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-30 10:16:16 +00:00 |
|
noiji
|
f5dfa07531
|
[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598)
Signed-off-by: noiji <>
|
2025-06-30 18:21:56 +09:00 |
|
Reid
|
022c58b80f
|
[doc] Add Slack and Forum to the top navigation (#20208)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-06-30 07:53:45 +00:00 |
|
Woosuk Kwon
|
19108ef311
|
[Misc] Fix import (#20233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-29 20:34:54 -07:00 |
|
Chendi.Xue
|
5a52f389dd
|
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-06-29 19:46:19 -07:00 |
|
redmoe-moutain
|
65b1cbb138
|
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
|
2025-06-29 19:34:36 -07:00 |
|
Huy Do
|
6c9837a761
|
Fix cuda_archs_loose_intersection when handling sm_*a (#20207)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-06-29 16:52:34 -07:00 |
|
Dipika Sikka
|
6f2f53a82d
|
[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-06-29 22:05:40 +00:00 |
|
Michael Goin
|
7b1895e6ce
|
[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-29 10:31:37 +08:00 |
|
Wentao Ye
|
4d36693687
|
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-28 22:06:38 +00:00 |
|
Sage Moore
|
4672c72f44
|
capture works replay does not
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-28 19:14:48 +00:00 |
|
Stan Wozniak
|
daec9dea6e
|
[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
|
2025-06-28 08:16:41 -07:00 |
|
Nicolò Lucchesi
|
daceac57c7
|
[Frontend] Generalize v1/audio/transcriptions endpoint (#20179)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-28 08:15:26 -07:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Jiayi Yan
|
7b460c25f9
|
[BugFix] Fix the incorrect func name in the comments. (config.py) (#20185)
|
2025-06-27 22:51:16 -07:00 |
|
Michael Goin
|
f719772281
|
[Bugfix] Properly reject requests with empty list guided_choice (#20195)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 22:50:52 -07:00 |
|
Wentao Ye
|
d45417b804
|
fix ci issue distributed 4 gpu test (#20204)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-27 22:50:00 -07:00 |
|