Gregory Shtrasberg
06c0922a69
[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 ( #17870 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-11 15:58:45 +08:00
Kuntai Du
9112155283
[Perf] Use small max_num_batched_tokens for A100 ( #17885 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-05-11 07:53:23 +00:00
Harry Mellor
4b2ed7926a
Improve configs - the rest! ( #17562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-09 15:18:44 -07:00
vllmellm
3c9396a64f
[FEAT][ROCm]: Support AITER MLA on V1 Engine ( #17523 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
2025-05-09 10:42:05 +08:00
Harry Mellor
646a31e51e
Fix and simplify deprecated=True CLI kwarg ( #17781 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-07 16:51:06 +01:00
Jee Jee Li
ba7703e659
[Misc] Remove qlora_adapter_name_or_path ( #17699 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-06 23:10:37 -07:00
Aaron Pham
175bda67a1
[Feat] Add deprecated=True to CLI args ( #17426 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-05-06 08:11:27 -07:00
Li, Jiang
a6fed02068
[V1][PP] Support PP for MultiprocExecutor ( #14219 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
2025-05-06 07:58:05 -07:00
Michael Goin
d419aa5dc4
[V1] Enable TPU V1 backend by default ( #17673 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-06 06:49:49 -07:00
Cyrus Leung
46fae69cf0
[Misc] V0 fallback for --enable-prompt-embeds ( #17615 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-03 22:59:24 +00:00
Cyrus Leung
887d7af882
[Core] Gate prompt_embeds behind a feature flag ( #17607 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-04 00:19:20 +08:00
Chenyaaang
87baebebd8
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name ( #17508 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-05-02 21:42:44 -07:00
Harry Mellor
785d75a03b
Automatically tell users that dict args must be valid JSON in CLI ( #17577 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-02 05:24:55 -07:00
Jerry Zhang
109e15a335
Add pt_load_map_location to allow loading to cuda ( #16869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-05-01 23:23:42 -07:00
Chen Xia
61c299f81f
[Misc]add configurable cuda graph size ( #17201 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 11:04:50 -07:00
Harry Mellor
6768ff4a22
Move the last arguments in arg_utils.py to be in their final groups ( #17531 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 10:31:44 -07:00
Chauncey
98060b001d
[Feature][Frontend]: Deprecate --enable-reasoning ( #17452 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-05-01 06:46:16 -07:00
Harry Mellor
a257d9bccc
Improve configs - ObservabilityConfig ( #17453 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 03:52:05 -07:00
Alec
0be6d05b5e
[V1][Metrics] add support for kv event publishing ( #16750 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-04-30 07:44:45 -07:00
Harry Mellor
13698db634
Improve configs - ModelConfig ( #17130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-30 10:38:22 +08:00
Harry Mellor
a6977dbd15
Simplify (and fix) passing of guided decoding backend options ( #17008 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 19:02:23 +00:00
Harry Mellor
2ef5d106bb
Improve literal dataclass field conversion to argparse argument ( #17391 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 16:25:08 +00:00
Hyogeun Oh (오효근)
193e78e35d
[Fix] Documentation spacing in compilation config help text ( #17342 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-04-29 00:16:17 -07:00
Cyrus Leung
ebb3930d28
[Misc] Move config fields to MultiModalConfig ( #17343 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-29 06:37:21 +00:00
Harry Mellor
f94886946e
Improve conversion from dataclass configs to argparse arguments ( #17303 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-28 16:22:12 +00:00
Lucas Wilkinson
d8bccde686
[BugFix] Fix vllm_flash_attn install issues ( #17267 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-04-27 17:27:56 -07:00
rasmith
68af5f6c5c
[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary ( #17215 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-04-25 19:55:05 -07:00
Benjamin Chislett
a0e619e62a
[V1][Spec Decode] EAGLE-3 Support ( #16937 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-25 15:43:07 -07:00
rasmith
a41351f363
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization ( #15734 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-04-25 00:45:02 -07:00
Harry Mellor
6ca0234478
Move missed SchedulerConfig args into scheduler config group in EngineArgs ( #17131 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 22:48:53 -07:00
Harry Mellor
0fa939e2d1
Improve configs - LoRAConfig + PromptAdapterConfig ( #16980 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 10:29:34 -07:00
Harry Mellor
0a05ed57e6
Simplify TokenizerGroup ( #16790 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:43:56 -07:00
Woosuk Kwon
46e678bcff
[Minor] Use larger batch sizes for A100/B100/B200/MI300x ( #17073 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-23 19:18:59 -07:00
Travis Johnson
3cde34a4a4
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar ( #15949 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-04-23 18:34:41 +00:00
Harry Mellor
d059110498
Improve configs - SpeculativeConfig ( #16971 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-22 12:55:36 +00:00
Cyrus Leung
8f7bace7c3
[Doc] Improve documentation for multimodal CLI args ( #16960 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-22 08:35:35 +00:00
wangxiyuan
b9b4746950
[V1] Remove additional_config check ( #16710 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-21 20:45:27 -07:00
Michael Goin
986537f1c3
[V1] V1 FlashInfer Attention ( #16684 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Aurick Qiao <qiao@aurick.net>
2025-04-22 00:38:41 +00:00
Harry Mellor
4b07d36891
Improve configs - CacheConfig ( #16835 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-20 12:25:04 +08:00
Harry Mellor
686623c5e7
Fix nullable_kvs fallback ( #16837 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-18 05:58:39 -07:00
Harry Mellor
e78587a64c
Improve-mm-and-pooler-and-decoding-configs ( #16789 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 22:13:32 -07:00
Cyrus Leung
c16fb5dae8
[Doc] Improve help examples for --compilation-config ( #16729 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-17 21:22:34 -07:00
Yihua Cheng
3408e47159
[P/D][V1] KV Connector API V1 ( #15960 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-17 13:22:40 -07:00
Harry Mellor
d27ea94034
Improve configs - TokenizerPoolConfig + DeviceConfig ( #16603 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 11:19:42 +00:00
Harry Mellor
e51929ebca
Improve configs - SchedulerConfig ( #16533 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-14 17:24:16 +08:00
Cyrus Leung
d9fc8cd9da
[V1] Enable multi-input by default ( #15799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-12 08:52:39 +00:00
Harry Mellor
cd77382ac1
Improve configs - LoadConfig ( #16422 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-11 20:27:27 +00:00
Michael Goin
aa3b3d76e0
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True ( #16447 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-11 08:09:52 +00:00
Russell Bryant
9665313c39
[V1] Set structured output backend to auto by default ( #15724 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-10 17:53:26 +00:00
Harry Mellor
0c54fc7273
Improve configs - ParallelConfig ( #16332 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-10 17:34:37 +00:00