Harry Mellor
|
e42389f9d7
|
Transformers backend already supports V1 (#15463)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-25 20:26:16 -07:00 |
|
Lu Fang
|
082ab86f5f
|
[V1] Support long_prefill_token_threshold in v1 scheduler (#15419)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-25 14:22:26 -07:00 |
|
Joe Runde
|
5f063a80bd
|
[bugfix] add supports_v1 platform interface (#15417)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-25 15:00:32 -04:00 |
|
Russell Bryant
|
a09ad90a72
|
[V1] guidance backend for structured output + auto fallback mode (#14779)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
|
2025-03-24 21:02:33 -07:00 |
|
Gregory Shtrasberg
|
8279201ce6
|
[Build] Cython compilation support fix (#14296)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-24 23:37:54 +00:00 |
|
sfbemerk
|
cc8accfd53
|
[Misc] Update guided decoding logs to debug (#15310)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2025-03-24 04:25:20 -07:00 |
|
Lucas Wilkinson
|
dccf535f8e
|
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-23 15:07:04 -07:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
hijkzzz
|
0661cfef7a
|
Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 10:23:35 +08:00 |
|
Russell Bryant
|
b877031d80
|
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 14:06:39 -07:00 |
|
Russell Bryant
|
eb63ea1e18
|
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 15:56:17 +00:00 |
|
wwl2755
|
1c2bec0f82
|
[Doc] add load_format items in docs (#14804)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-21 22:36:43 -07:00 |
|
Nick Hill
|
da6ea29f7a
|
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 22:24:10 -07:00 |
|
Isotr0py
|
f8a08cb90d
|
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-21 03:14:19 +00:00 |
|
Cody Yu
|
5df2da5b97
|
[Misc] Better RayExecutor and multiprocessing compatibility (#14705)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-20 19:27:46 -07:00 |
|
Jee Jee Li
|
10f55fe6c5
|
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-20 19:17:12 -07:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Woosuk Kwon
|
2b22290ce0
|
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 15:24:16 -07:00 |
|
maobaolong
|
26dd972adb
|
[FEAT]Support reset prefix cache by specified device (#15003)
|
2025-03-19 10:54:41 -07:00 |
|
Roger Wang
|
dafb4e504a
|
[V1][Bugfix] Fix oracle for device checking (#15104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-19 18:35:32 +08:00 |
|
Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|
Simon Mo
|
3b457143d2
|
[Bugfix] Register serializers for V0 MQ Engine (#15009)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-18 09:14:47 -04:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Roger Wang
|
b30c75dda4
|
[V1] Remove V0 fallback for mistral-tokenizer (#14873)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-15 20:21:11 -07:00 |
|
Jun Duan
|
74bc397b0a
|
[Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
|
2025-03-15 06:28:14 -07:00 |
|
Bryan Lu
|
9ed6ee92d6
|
[Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-03-15 06:50:33 +00:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
Joe Runde
|
47532cd9f4
|
[core][V1] pluggable scheduler (#14466)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 01:15:15 +00:00 |
|
Cody Yu
|
485afdd3cb
|
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils (#14379)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-10 20:42:11 -04:00 |
|
Harry Mellor
|
39be30351f
|
Correct capitalisation: Github -> GitHub (#14561)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 15:53:33 +00:00 |
|
Chauncey
|
460f553a6d
|
[Misc] Add log information for handle_process_request. (#14130)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 08:40:50 +00:00 |
|
Yanyi Liu
|
a21076ed3a
|
[Misc] Ensure out-of-tree quantization method recognize by cli args (#14328)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-09 12:13:31 +00:00 |
|
Aaron Pham
|
0b7f06b447
|
[Misc] add use_tqdm_on_load to reduce logs (#14407)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-08 05:57:46 -08:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
Aleksandr Malyshev
|
0ca3b8e01c
|
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-03-07 02:51:47 -08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Tyler Michael Smith
|
cc2f9b32c8
|
[Distributed] Add enable_expert_parallel arg (#14305)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 18:54:45 +00:00 |
|
youkaichao
|
151b08e0fe
|
[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-07 00:32:46 +08:00 |
|
courage17340
|
caac5c2e59
|
[Bugfix][Core] fix abort_seq_group and memory leak when n>1 (#14326)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-03-06 23:59:32 +08:00 |
|
Mark McLoughlin
|
c8525f06fc
|
[V0][Metrics] Deprecate some questionable request time metrics (#14135)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-04 15:11:33 +00:00 |
|
Mark McLoughlin
|
2dfdfed8a0
|
[V0][Metrics] Deprecate some KV/prefix cache metrics (#14136)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 18:25:46 +00:00 |
|
Mark McLoughlin
|
c41d27156b
|
[V0][Metrics] Remove unimplemented vllm:tokens_total (#14134)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 17:50:22 +00:00 |
|
Ce Gao
|
bf33700ecd
|
[v0][structured output] Support reasoning output (#12955)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-02 14:49:42 -05:00 |
|
Thibault Schueller
|
b3f7aaccd0
|
[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090)
|
2025-02-28 00:52:25 -08:00 |
|
Szymon Ożóg
|
7f0be2aa24
|
[Model] Deepseek GGUF support (#13167)
|
2025-02-27 02:08:35 -08:00 |
|
Wallas Henrique
|
4cb6fa0a9c
|
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 10:52:34 -08:00 |
|
Joe Runde
|
3f808cc044
|
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 19:07:29 +08:00 |
|
Jee Jee Li
|
5157338ed9
|
[Misc] Improve LoRA spelling (#13831)
|
2025-02-25 23:43:01 -08:00 |
|
Eli Boyarski
|
7196a3b1db
|
[Doc] arg_utils.py: fixed a typo (#13785)
|
2025-02-24 18:23:04 -08:00 |
|