Sage Moore
7cc5a549ad
cleanup some of the should_ubatch logic
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 14:22:53 +00:00
Reid
9854dc9040
[Frontend] improve vllm bench <bench_type> --help display ( #20430 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 14:22:16 +00:00
Isotr0py
ff5c60fad8
[Misc] Automatically tag PRs to add new models ( #20222 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-03 07:11:03 -07:00
wang.yuqi
6f1229f91d
[Model][2/N] Automatic conversion of CrossEncoding model ( #19978 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-03 13:59:23 +00:00
Jee Jee Li
1819fbda63
[Quantization] Bump to use latest bitsandbytes ( #20424 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-03 21:58:46 +08:00
Sage Moore
83caef8bac
cleanups for ubatching.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:50:19 +00:00
Sage Moore
2f3461ad23
cleanup flashmla.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:52 +00:00
Sage Moore
7e2ff2620e
cleanup flashmla.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:45:07 +00:00
Sage Moore
1d75a029a9
remove cudagraph logic from flashmla.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:41:49 +00:00
Sage Moore
17a7ceef27
cleanup deepep ll
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:35:21 +00:00
Sage Moore
6e2a3c0841
minor changes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:29:32 +00:00
Sage Moore
631be12edb
refactoring pplx_prepare_finalize.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:16:34 +00:00
Sage Moore
a9d47e8652
remove always_microbatch_if_enabled
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:09:33 +00:00
Sage Moore
fc562e22e2
cleanup gpu_worker.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:59 +00:00
Sage Moore
1ca65412b8
cleanup backends/utils.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:33 +00:00
Sage Moore
3112714bdc
cleanup logger.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:05:38 +00:00
Sage Moore
0c03d154b5
cleanup config.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:03:26 +00:00
Sage Moore
9b7edc0343
cleanup data_parallel.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:02:12 +00:00
Sage Moore
be2e1632fd
delete basic-ub.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:01:01 +00:00
Li, Jiang
7f0367109e
[CI/Build][CPU] Enable cross compilation in CPU release pipeline ( #20423 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-03 05:26:12 -07:00
Ning Xie
fb14d53cf6
[Kernel] refactor cpu worker v0 cache dtype ( #20080 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-03 08:39:14 +00:00
Cyrus Leung
b024a42e93
[Core] Move multimodal placeholder from chat utils to model definition ( #20355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-03 08:18:30 +00:00
Michael Yao
cb97f2bfc5
[Docs] Replace two list with tables in intel_gaudi.md ( #20414 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-03 00:48:25 -07:00
Reid
359200f6ac
[doc] fix link ( #20417 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 00:21:57 -07:00
Lifans
220aee902a
[Misc] Add rules to label Speculative Decoding Related PRs ( #20406 )
...
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-07-02 23:56:49 -07:00
Nick Hill
67d25eca05
[Tests] Update online DP tests to verify that requests are balanced ( #20157 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-03 14:49:13 +08:00
qscqesze
363528de27
[Feature] Support MiniMax-M1 function calls features ( #20297 )
...
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
2025-07-03 06:48:27 +00:00
QiliangCui
4ff61ababa
[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 ( #20385 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-03 06:46:41 +00:00
Li, Jiang
0ec3779df7
[Bugfix][CI/CD][CPU] Fix CPU CI tests ( #20383 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-02 20:11:36 -07:00
Chenheli Hua
b616f6a53d
[Misc] Small: Fix video loader return type annotations. ( #20389 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-03 03:10:39 +00:00
bnellnm
2e25bb12a8
[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py ( #20381 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-07-03 02:07:43 +00:00
Louie Tsai
9965c47d0d
Enable CPU nightly performance benchmark and its Markdown report ( #18444 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-07-02 17:50:25 -07:00
Nick Hill
059d4cdb49
[BugFix] Fix DP headless mode arg validation ( #20398 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 17:15:32 -07:00
Tyler Michael Smith
bdb84e26b0
[Bugfix] Fixes for FlashInfer's TORCH_CUDA_ARCH_LIST ( #20136 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
2025-07-02 17:15:11 -07:00
Nicolò Lucchesi
3dd359147d
[Docs] Update EAGLE example ( #20375 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-02 17:13:51 -07:00
Sage Moore
ce3ef95c11
turn yields on for pplx
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:34:02 +00:00
Sage Moore
18f7bfb501
ubatching fix
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:22:41 +00:00
Sage Moore
3d833aa759
cleanup
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:20:21 +00:00
Sage Moore
0e499c4f4d
first round of cleanups
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
Sage Moore
0767d9863f
fix data_parallel.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 19:25:59 +00:00
Nick Hill
657f2f301a
[DP] Support external DP Load Balancer mode ( #19790 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 10:21:52 -07:00
Sage Moore
c0efbbb5de
misc changes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
Lucas Wilkinson
f7a3ee0ea1
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
Sage Moore
57d404bbb8
misc
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:37:58 +00:00
vllmellm
a1aafc827a
[ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) ( #20254 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-07-02 16:25:46 +00:00
rongfu.leng
139508a418
[Misc] add handler HF_TOKEN is emptry string ( #20369 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-07-02 09:14:31 -07:00
Nick Hill
d265414dbc
[Minor] Clean up incorrect comment in test ( #20382 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-02 09:13:37 -07:00
afeldman-nm
48fb076cbc
[V1] LogitsProcessor programming model ( #16728 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-02 09:10:42 -07:00
bnellnm
c1909e7e8c
[Kernels] MoE refactor ( #19636 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2025-07-02 06:08:27 -07:00
cronoik-inceptionai
b95877509b
Documentation update tool_calling: mapping back to function from response ( #20373 )
2025-07-02 05:55:49 -07:00