Woosuk Kwon
ada8a47b12
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-13 18:10:05 -07:00
Woosuk Kwon
6b42a56d46
Merge branch 'main' into v1-sched-interface-2
2025-03-13 18:09:36 -07:00
Lucas Wilkinson
d47807ba08
[Attention] Remove slow setattr in MLA ( #14769 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-13 21:31:14 +00:00
afeldman-nm
02fcaa3d0a
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output ( #14624 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
2025-03-13 19:07:34 +00:00
Aaron Pham
8a4a2efc6f
[V1][Core] using cached vocab_size for Structured Outputs ( #14630 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-13 11:39:28 -07:00
Cyrus Leung
8e9ffd37d6
[Misc] Clean up processor tests ( #14771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-13 18:25:37 +00:00
Woosuk Kwon
01b3fd0af7
[V1][Minor] Minor enhancements on scheduler ( #14732 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-13 08:53:22 -07:00
Cyrus Leung
f53a0586b9
[Bugfix] Fix prompt format of GLM4V ( #14539 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-13 11:37:17 +00:00
Isotr0py
b1cc4dfef5
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel ( #14738 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 03:10:02 -07:00
Cyrus Leung
382403921f
[VLM] Support pan-and-scan for Gemma3 multi-modal processor ( #14672 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-13 02:23:12 -07:00
Jee Jee Li
a73122de96
[Bugfix] fix benchmark moe ( #14653 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-13 16:12:42 +08:00
Jee Jee Li
bd44b812cb
[CI/Build] Delete ultravox LoRA test ( #14730 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-13 07:57:39 +00:00
Szymon Ożóg
55211b01e8
[Bugfix] Fix chunked prefill for GGUF ( #14666 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-13 07:19:03 +00:00
Woosuk Kwon
a7facf98d9
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 23:44:46 -07:00
Woosuk Kwon
1e7bf7970a
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 23:38:12 -07:00
Woosuk Kwon
24ce0a7638
Add simple scheduler
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 23:25:10 -07:00
Woosuk Kwon
e484ecb947
Add logging
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 22:09:30 -07:00
Woosuk Kwon
da07067215
Add common states
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 22:04:57 -07:00
Kyle Sayers
5d043c1685
[Quant] Bamba SupportsQuant ( #14698 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-03-13 04:57:05 +00:00
Kyle Sayers
36d1ccb286
[Quant] BartModel SupportsQuant ( #14699 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-03-13 04:55:59 +00:00
Woosuk Kwon
6cd1b1a18c
Merge branch 'main' into v1-sched-interface-2
2025-03-12 21:45:44 -07:00
Siyuan Liu
1bc3b739c4
[V1][TPU] Add assertion on multi-step-scheduler ( #14707 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-03-12 21:37:58 -07:00
Mathis Felardos
1bd32bc8dd
[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config ( #14367 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
2025-03-12 20:15:20 -07:00
TY-AMD
128bf75283
[BugFix][TritonMLA] Process weights after model loading for GGUF ( #14555 )
...
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
2025-03-12 20:14:36 -07:00
Gregory Shtrasberg
a94a699c3f
[ROCm][FP8] Fix for adjustments needed only for fnuz ( #14689 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-03-12 20:14:04 -07:00
Richard Liu
ab426ec9c0
Add ray[data] as tpu dependency ( #14691 )
...
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
2025-03-12 20:13:48 -07:00
Joe Runde
165290d357
[bugfix] fixup warning message for plugged schedulers for v1 ( #14700 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-03-12 20:12:13 -07:00
Woosuk Kwon
8730469cfa
mv to interface
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 18:23:39 -07:00
Woosuk Kwon
5b38e984b3
Merge branch 'main' into v1-sched-interface-2
2025-03-12 18:12:39 -07:00
Kevin H. Luu
ce20124671
[release] Add force remove for TPU logs ( #14697 )
2025-03-12 22:35:18 +00:00
Woosuk Kwon
53be4a8634
[V1] Allow sliding window + prefix caching ( #13069 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 11:21:19 -07:00
Nick Hill
f5d3acd474
[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-12 10:29:48 -07:00
TJian
916836bbfb
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. ( #14664 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-03-12 09:31:19 -07:00
Sage Moore
d9f83d6206
[ROCm] Enable chunked prefill/paged attention in MLA on ROCm ( #14316 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-03-12 15:51:20 +00:00
ameyanjarlekar
4a754fcf15
[Bugfix] Missing thumbnail from NVLM-D processor ( #14633 )
...
Signed-off-by: ameyanjarlekar <aanjarlekar@nvidia.com>
2025-03-12 08:50:49 -07:00
Woosuk Kwon
c0c25e25fa
[Model] Add support for Gemma 3 ( #14660 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-12 08:36:33 -07:00
Sage Moore
45f3f3f59e
[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. ( #14629 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-03-12 08:00:28 -04:00
Li, Jiang
ff47aab056
[CPU] Upgrade CPU backend to torch-2.6 ( #13381 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-12 10:41:13 +00:00
Woosuk Kwon
06e22ba44c
interface
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-12 00:00:03 -07:00
Woosuk Kwon
8d46d5d11d
Minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-11 23:43:04 -07:00
Woosuk Kwon
f198d7d07a
utils
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-11 23:36:42 -07:00
Woosuk Kwon
0bf6e97493
sched
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-11 23:35:10 -07:00
Woosuk Kwon
6e7209347d
mv
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-11 23:27:38 -07:00
Pavani Majety
debd6bbf09
[Kernel] Add ModelOpt FP4 Checkpoint Support ( #12520 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-03-12 05:13:11 +00:00
Benjamin Chislett
5c538c37b2
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing ( #14645 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-11 22:12:41 -07:00
Szymon Ożóg
e22ee1e7a2
[Kernel] GGUF MoE kernel ( #14613 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-12 03:33:27 +00:00
Isotr0py
e392d85831
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization ( #14545 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-11 20:12:52 -07:00
Aaron Pham
77a318bd01
[V1][Core] Support MistralTokenizer for Structured Output ( #14625 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-12 10:40:09 +08:00
Farzad Abdolhosseini
80e78d02ac
[Model] Extend Ultravox to accept audio longer than 30s ( #13631 )
...
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-03-12 10:27:10 +08:00
Jennifer Zhao
4a42b9f5d6
[Doc] Update benchmarks README ( #14646 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-11 19:23:04 -07:00