shangmingc
|
239b7befdd
|
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-31 09:19:35 -07:00 |
|
youkaichao
|
555aa21905
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-31 20:22:34 +08:00 |
|
wwl2755
|
94744ba41a
|
[V1] [Feature] Collective RPC (#15444)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-29 03:39:14 -07:00 |
|
Reid
|
26df46ee59
|
[Misc] cli auto show default value (#15582)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-03-28 22:23:00 +00:00 |
|
Russell Bryant
|
7329ff5468
|
[V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 23:46:45 +08:00 |
|
Ce Gao
|
32b14baf8a
|
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 00:23:30 -07:00 |
|
Jee Jee Li
|
726efc6a32
|
[Quantization][V1] BitsAndBytes support V1 (#15611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 10:12:47 +08:00 |
|
Nick Hill
|
15dac210f0
|
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-27 16:14:41 -07:00 |
|
Cyrus Leung
|
247181536f
|
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 17:36:32 +00:00 |
|
Rui Qiao
|
df8d3d1287
|
[Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556)
|
2025-03-27 06:21:07 +00:00 |
|
marko
|
27df5199d9
|
Support SHA256 as hash function in prefix caching (#15297)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-03-26 11:11:28 -07:00 |
|
Alex Brooks
|
1711b929b6
|
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
|
2025-03-26 14:28:07 +00:00 |
|
Harry Mellor
|
e42389f9d7
|
Transformers backend already supports V1 (#15463)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-25 20:26:16 -07:00 |
|
Lu Fang
|
082ab86f5f
|
[V1] Support long_prefill_token_threshold in v1 scheduler (#15419)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-25 14:22:26 -07:00 |
|
Joe Runde
|
5f063a80bd
|
[bugfix] add supports_v1 platform interface (#15417)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-25 15:00:32 -04:00 |
|
Russell Bryant
|
a09ad90a72
|
[V1] guidance backend for structured output + auto fallback mode (#14779)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
|
2025-03-24 21:02:33 -07:00 |
|
Gregory Shtrasberg
|
8279201ce6
|
[Build] Cython compilation support fix (#14296)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-24 23:37:54 +00:00 |
|
sfbemerk
|
cc8accfd53
|
[Misc] Update guided decoding logs to debug (#15310)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2025-03-24 04:25:20 -07:00 |
|
Lucas Wilkinson
|
dccf535f8e
|
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-23 15:07:04 -07:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
hijkzzz
|
0661cfef7a
|
Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 10:23:35 +08:00 |
|
Russell Bryant
|
b877031d80
|
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 14:06:39 -07:00 |
|
Russell Bryant
|
eb63ea1e18
|
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 15:56:17 +00:00 |
|
wwl2755
|
1c2bec0f82
|
[Doc] add load_format items in docs (#14804)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-21 22:36:43 -07:00 |
|
Nick Hill
|
da6ea29f7a
|
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 22:24:10 -07:00 |
|
Isotr0py
|
f8a08cb90d
|
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-21 03:14:19 +00:00 |
|
Cody Yu
|
5df2da5b97
|
[Misc] Better RayExecutor and multiprocessing compatibility (#14705)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-20 19:27:46 -07:00 |
|
Jee Jee Li
|
10f55fe6c5
|
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-20 19:17:12 -07:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Woosuk Kwon
|
2b22290ce0
|
[V1] Add flag to disable cascade attention (#15243)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-20 15:24:16 -07:00 |
|
maobaolong
|
26dd972adb
|
[FEAT]Support reset prefix cache by specified device (#15003)
|
2025-03-19 10:54:41 -07:00 |
|
Roger Wang
|
dafb4e504a
|
[V1][Bugfix] Fix oracle for device checking (#15104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-19 18:35:32 +08:00 |
|
Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|
Simon Mo
|
3b457143d2
|
[Bugfix] Register serializers for V0 MQ Engine (#15009)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-18 09:14:47 -04:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Roger Wang
|
b30c75dda4
|
[V1] Remove V0 fallback for mistral-tokenizer (#14873)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-15 20:21:11 -07:00 |
|
Jun Duan
|
74bc397b0a
|
[Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
|
2025-03-15 06:28:14 -07:00 |
|
Bryan Lu
|
9ed6ee92d6
|
[Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-03-15 06:50:33 +00:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
Joe Runde
|
47532cd9f4
|
[core][V1] pluggable scheduler (#14466)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 01:15:15 +00:00 |
|
Cody Yu
|
485afdd3cb
|
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils (#14379)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-10 20:42:11 -04:00 |
|
Harry Mellor
|
39be30351f
|
Correct capitalisation: Github -> GitHub (#14561)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 15:53:33 +00:00 |
|
Chauncey
|
460f553a6d
|
[Misc] Add log information for handle_process_request. (#14130)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 08:40:50 +00:00 |
|
Yanyi Liu
|
a21076ed3a
|
[Misc] Ensure out-of-tree quantization method recognize by cli args (#14328)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-09 12:13:31 +00:00 |
|
Aaron Pham
|
0b7f06b447
|
[Misc] add use_tqdm_on_load to reduce logs (#14407)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-08 05:57:46 -08:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
Aleksandr Malyshev
|
0ca3b8e01c
|
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-03-07 02:51:47 -08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Tyler Michael Smith
|
cc2f9b32c8
|
[Distributed] Add enable_expert_parallel arg (#14305)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 18:54:45 +00:00 |
|