Reid
|
ef9a2990ae
|
[doc] small fix (#20506)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-04 20:56:39 -07:00 |
|
Reid
|
7e90870491
|
[Misc] Add security warning for development mode endpoints (#20508)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-04 20:52:13 -07:00 |
|
Guy Stone
|
d3f05c9248
|
[Doc] fix mutltimodal_inputs.md gh examples link (#20497)
Signed-off-by: Guy Stone <guys@spotify.com>
|
2025-07-04 16:41:35 -07:00 |
|
Michael Goin
|
c108781c85
|
[CI Bugfix] Fix pre-commit failures on main (#20502)
|
2025-07-04 14:17:30 -07:00 |
|
Duncan Moss
|
3d184b95b8
|
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
|
2025-07-04 12:58:04 -06:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
Chenheli Hua
|
ffe00ef77a
|
[Misc] Small: Remove global media connector. Each test should have its own test connector object. (#20395)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-07-04 08:15:03 -07:00 |
|
Peter Pan
|
5561681d04
|
[CI] add kvcache-connector dependency definition and add into CI build (#18193)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-04 06:49:18 -07:00 |
|
Cyrus Leung
|
fbd62d8750
|
[Doc] Fix classification table in list of supported models (#20489)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-04 06:08:02 -07:00 |
|
wang.yuqi
|
2e26f9156a
|
[Model][3/N] Automatic conversion of CrossEncoding model (#20168)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-04 05:47:39 -07:00 |
|
sangbumlikeagod
|
9e5452ee34
|
[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
|
2025-07-04 11:28:07 +00:00 |
|
Michael Goin
|
0e3fe896e2
|
Support Llama 4 for fused_marlin_moe (#20457)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-04 07:55:10 +00:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
Wentao Ye
|
783921d889
|
[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-04 15:06:24 +08:00 |
|
Aaron Pham
|
4a98edff1f
|
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-04 15:05:49 +08:00 |
|
Reid
|
a7bab0c9e5
|
[Misc] small update (#20462)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-03 20:33:44 -07:00 |
|
汪志鹏
|
25950dca9b
|
Add ignore consolidated file in mistral example code (#20420)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-07-04 02:55:07 +00:00 |
|
Gabriel Marinho
|
a4113b035c
|
[Platform] Add custom default max tokens (#18557)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
|
2025-07-04 10:50:17 +08:00 |
|
Michael Goin
|
7e1665b089
|
[Misc] Change warn_for_unimplemented_methods to debug (#20455)
|
2025-07-04 02:35:08 +00:00 |
|
Seiji Eicher
|
8d1096e7db
|
[Bugfix] Register reducer even if transformers_modules not available (#19510)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-03 22:08:12 +00:00 |
|
Nicolò Lucchesi
|
8d775dd30a
|
[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning (#20400)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-03 14:56:09 -07:00 |
|
bnellnm
|
78fe77534b
|
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-07-03 14:55:40 -07:00 |
|
Yuxuan Zhang
|
2f2fcb31b8
|
[Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
v0.9.2rc1
|
2025-07-03 21:41:13 +00:00 |
|
Sage Moore
|
82ae694de6
|
comments cleanup etc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 20:47:39 +00:00 |
|
Ning Xie
|
1dba2c4ebe
|
[Misc] adjust for ipv6 for mookcacke url parse (#20107)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-03 20:27:17 +00:00 |
|
Sage Moore
|
10ca263058
|
split some of the ubatching logic out of _run_model
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 20:26:56 +00:00 |
|
Isotr0py
|
71d6de3a26
|
[Misc] Clean up InternVL family config registration (#19992)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-03 20:01:47 +00:00 |
|
Sage Moore
|
908e9f8f54
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 19:52:41 +00:00 |
|
Sage Moore
|
06cc133a63
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:51:08 +00:00 |
|
Sage Moore
|
3a41a3dcff
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:23:30 +00:00 |
|
Sage Moore
|
bb0645c644
|
separate ubatch and normal runs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 17:07:58 +00:00 |
|
Sage Moore
|
510e839429
|
more cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 16:35:52 +00:00 |
|
Sage Moore
|
f7b6e600b8
|
gpu_model_runner cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 16:23:11 +00:00 |
|
Alexei-V-Ivanov-AMD
|
536fd33003
|
[CI] Trimming some failing test groups from AMDPRODUCTION. (#20390)
|
2025-07-03 08:21:31 -07:00 |
|
Reid
|
619b9f5c7e
|
[Frontend] fix duplicate output for bench subcmd (#20446)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-03 08:02:06 -07:00 |
|
Nicolò Lucchesi
|
d1b689c445
|
[Bugfix] Fix flaky test_streaming_response test (#20363)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-03 14:46:24 +00:00 |
|
Sage Moore
|
0056be26f6
|
less ARs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 14:33:53 +00:00 |
|
Sage Moore
|
7cc5a549ad
|
cleanup some of the should_ubatch logic
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 14:22:53 +00:00 |
|
Reid
|
9854dc9040
|
[Frontend] improve vllm bench <bench_type> --help display (#20430)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-03 14:22:16 +00:00 |
|
Isotr0py
|
ff5c60fad8
|
[Misc] Automatically tag PRs to add new models (#20222)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-03 07:11:03 -07:00 |
|
wang.yuqi
|
6f1229f91d
|
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-03 13:59:23 +00:00 |
|
Jee Jee Li
|
1819fbda63
|
[Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-03 21:58:46 +08:00 |
|
Sage Moore
|
83caef8bac
|
cleanups for ubatching.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:50:19 +00:00 |
|
Sage Moore
|
2f3461ad23
|
cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:45:52 +00:00 |
|
Sage Moore
|
7e2ff2620e
|
cleanup flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:45:07 +00:00 |
|
Sage Moore
|
1d75a029a9
|
remove cudagraph logic from flashmla.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:41:49 +00:00 |
|
Sage Moore
|
17a7ceef27
|
cleanup deepep ll
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:35:21 +00:00 |
|
Sage Moore
|
6e2a3c0841
|
minor changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:29:32 +00:00 |
|
Sage Moore
|
631be12edb
|
refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:16:34 +00:00 |
|
Sage Moore
|
a9d47e8652
|
remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-03 13:09:33 +00:00 |
|