4837 Commits

Author SHA1 Message Date
王博伟
512d77d582
Update quickstart.md (#13958) 2025-02-27 16:05:11 +00:00
Szymon Ożóg
7f0be2aa24
[Model] Deepseek GGUF support (#13167) 2025-02-27 02:08:35 -08:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models (#13320) 2025-02-27 02:06:41 -08:00
Michael Goin
788f284b53
Fix test_block_fp8.py test for MoE (#13915)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-27 18:00:00 +08:00
Yang Zheng
4b1d141f49
[PP] Correct cache size check (#13873)
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
2025-02-27 17:47:29 +08:00
Chauncey
10c3b8c1cf
[Misc] fixed 'required' is an invalid argument for positionals (#13948)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-02-27 09:06:49 +00:00
Brayden Zhong
a7f37314b7
[CI/Build] Add examples/ directory to be labelled by mergify (#13944)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-02-27 08:24:11 +00:00
Mark McLoughlin
cd711c48b2
[V1][Metrics] Handle preemptions (#13169) 2025-02-26 20:04:59 -08:00
Sage Moore
378b3ef6f8
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922) 2025-02-26 20:04:12 -08:00
Rui Qiao
c9944acbf9
[misc] Rename Ray ADAG to Compiled Graph (#13928) 2025-02-26 20:03:28 -08:00
Michael Goin
ca377cf1b9
Use CUDA 12.4 as default for release and nightly wheels (#12098) 2025-02-26 19:06:37 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
a31614e386
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-02-27 10:39:10 +08:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
Woosuk Kwon
b382a7f28f
[BugFix] Make FP8 Linear compatible with torch.compile (#13918)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-26 13:48:55 -08:00
Wallas Henrique
4cb6fa0a9c
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 10:52:34 -08:00
Chauncey
d08b285adf
[Misc] fixed qwen_vl_utils parameter error (#13906) 2025-02-26 08:31:53 -08:00
Chenyaaang
b27122acc2
[TPU] use torch2.6 with whl package (#13860)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
2025-02-26 08:18:54 -05:00
Cyrus Leung
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests (#13895) 2025-02-26 04:56:50 -08:00
Joe Runde
3f808cc044
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 19:07:29 +08:00
Brayden Zhong
ec8a5e5386
[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-02-26 19:06:47 +08:00
Florian Greinacher
215bf150a6
[Bugfix] Handle None parameters in Mistral function calls. (#13786) 2025-02-26 03:06:21 -08:00
Harry Mellor
0ecdd98031
Add comments on accessing kv_cache and attn_metadata (#13887)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-26 18:41:02 +08:00
Cyrus Leung
7b700ec8c8
[Bugfix] Add test example for Ultravox v0.5 (#13890) 2025-02-26 02:31:43 -08:00
Roger Wang
7ca1da020f
[Misc] Fix input processing for Ultravox (#13871) 2025-02-25 23:56:34 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
Seth Kimmel
e206b54331
[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine (#13837)
Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>
2025-02-26 14:58:24 +08:00
Sage Moore
1d35662e6d
[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-02-26 14:56:58 +08:00
Albert
e656f638de
[Doc] fix the incorrect module path of tensorize_vllm_model (#13863) 2025-02-25 22:56:19 -08:00
Harry Mellor
145944cb94
Improve pipeline partitioning (#13839) 2025-02-25 18:53:56 -08:00
Henry Tsang
094b7d9496
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797) 2025-02-25 18:52:03 -08:00
Chenguang Li
e1fe7591f2
[Misc]Code Cleanup (#13859)
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2025-02-26 10:44:30 +08:00
Lily Liu
5629f26df7
[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729) 2025-02-25 18:14:48 -08:00
Rui Qiao
9ba28043b5
[misc] Show driver IP info when Ray fails to allocate driver worker (#13858)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-02-26 09:53:43 +08:00
Harry Mellor
24679788ed
DeepSeek V2/V3/R1 only place lm_head on last pp rank (#13833)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-26 01:24:57 +00:00
Michael Goin
07c4353057
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
Harry Mellor
34e3494e70
Fix failing MyGemma2Embedding test (#13820)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-25 12:33:03 -08:00
Liangfu Chen
f75aa72732
[Neuron] Add custom_ops for neuron backend (#13246)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: George Novack <gnovack@amazon.com>
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>
2025-02-25 11:47:49 -08:00
Chen1022
340e39e387
Fix string parsing error (#13825) 2025-02-25 08:20:29 -08:00
Cyrus Leung
f4133ce4e5
[Bugfix] Revert inspection code in #13743 (#13832)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-26 00:18:50 +08:00
Wen Sun
6522d55b6f
Fix /v1/audio/transcriptions Bad Request Error (#13811) 2025-02-25 06:03:33 -08:00
Isotr0py
6ff518626c
[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818) 2025-02-25 06:03:02 -08:00
Nichols A. Romero
fa82074167
[Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623)
Signed-off-by: Nichols A. Romero <nick.romero@amd.com>
2025-02-25 11:08:20 +00:00
Junlin Zhou
75e9d49796
[Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468) 2025-02-25 02:13:09 -08:00
Chen1022
32c3b6bfd1
[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724)
Signed-off-by: Chen-0210 <chenjincong11@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-02-25 10:12:19 +00:00
Jee Jee Li
37b6cb4985
[CI/Build] Fix V1 LoRA failure (#13767) 2025-02-25 02:01:15 -08:00
Gregory Shtrasberg
aabeb2688f
[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593) 2025-02-25 00:39:59 -08:00
Jiayi Yao
2f42a4888c
[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953) 2025-02-25 00:38:42 -08:00
Rui Qiao
3173c3b34e
[misc] Clean up ray compiled graph type hints (#13731) 2025-02-25 00:37:08 -08:00
Shanshan Shen
2d87d7d1ac
[Bugfix] Modify modelscope api usage in transformer_utils (#13807) 2025-02-25 00:36:07 -08:00
Russell Bryant
aab392774b
[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-25 08:21:25 +00:00