Cyrus Leung
|
a2dd48c386
|
[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 19:14:55 +00:00 |
|
dependabot[bot]
|
126f6beeb4
|
Bump azure/setup-helm from 4.2.0 to 4.3.0 (#13742)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-02-27 19:04:10 +00:00 |
|
Yang Chen
|
58d1b2aa77
|
[Attention] MLA support for V1 (#13789)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-02-27 13:14:17 -05:00 |
|
Cyrus Leung
|
f1579b229d
|
[VLM] Generalized prompt updates for multi-modal processor (#13964)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 17:44:25 +00:00 |
|
Isotr0py
|
7864875879
|
[Bugfix] Fix qwen2.5-vl overflow issue (#13968)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-27 17:30:39 +00:00 |
|
Noam Gat
|
1dd422b64a
|
Update LMFE version to v0.10.11 to support new versions of transforme… (#13930)
|
2025-02-27 17:16:12 +00:00 |
|
Rui Qiao
|
06c8f8d885
|
[bugfix] Fix profiling for RayDistributedExecutor (#13945)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-28 01:01:21 +08:00 |
|
Harry Mellor
|
5677c9bb3e
|
Deduplicate .pre-commit-config.yaml's exclude (#13967)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-27 16:27:47 +00:00 |
|
王博伟
|
512d77d582
|
Update quickstart.md (#13958)
|
2025-02-27 16:05:11 +00:00 |
|
Szymon Ożóg
|
7f0be2aa24
|
[Model] Deepseek GGUF support (#13167)
|
2025-02-27 02:08:35 -08:00 |
|
Isotr0py
|
edf309ebbe
|
[VLM] Support multimodal inputs for Florence-2 models (#13320)
|
2025-02-27 02:06:41 -08:00 |
|
Michael Goin
|
788f284b53
|
Fix test_block_fp8.py test for MoE (#13915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-27 18:00:00 +08:00 |
|
Yang Zheng
|
4b1d141f49
|
[PP] Correct cache size check (#13873)
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
|
2025-02-27 17:47:29 +08:00 |
|
Chauncey
|
10c3b8c1cf
|
[Misc] fixed 'required' is an invalid argument for positionals (#13948)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-02-27 09:06:49 +00:00 |
|
Brayden Zhong
|
a7f37314b7
|
[CI/Build] Add examples/ directory to be labelled by mergify (#13944)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-27 08:24:11 +00:00 |
|
Mark McLoughlin
|
cd711c48b2
|
[V1][Metrics] Handle preemptions (#13169)
|
2025-02-26 20:04:59 -08:00 |
|
Sage Moore
|
378b3ef6f8
|
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922)
|
2025-02-26 20:04:12 -08:00 |
|
Rui Qiao
|
c9944acbf9
|
[misc] Rename Ray ADAG to Compiled Graph (#13928)
|
2025-02-26 20:03:28 -08:00 |
|
Michael Goin
|
ca377cf1b9
|
Use CUDA 12.4 as default for release and nightly wheels (#12098)
|
2025-02-26 19:06:37 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
a31614e386
|
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-27 10:39:10 +08:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|
Woosuk Kwon
|
b382a7f28f
|
[BugFix] Make FP8 Linear compatible with torch.compile (#13918)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-26 13:48:55 -08:00 |
|
Wallas Henrique
|
4cb6fa0a9c
|
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 10:52:34 -08:00 |
|
Chauncey
|
d08b285adf
|
[Misc] fixed qwen_vl_utils parameter error (#13906)
|
2025-02-26 08:31:53 -08:00 |
|
Chenyaaang
|
b27122acc2
|
[TPU] use torch2.6 with whl package (#13860)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
|
2025-02-26 08:18:54 -05:00 |
|
Cyrus Leung
|
934bb99c71
|
[Bugfix] Update expected token counts for Ultravox tests (#13895)
|
2025-02-26 04:56:50 -08:00 |
|
Joe Runde
|
3f808cc044
|
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 19:07:29 +08:00 |
|
Brayden Zhong
|
ec8a5e5386
|
[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-26 19:06:47 +08:00 |
|
Florian Greinacher
|
215bf150a6
|
[Bugfix] Handle None parameters in Mistral function calls. (#13786)
|
2025-02-26 03:06:21 -08:00 |
|
Harry Mellor
|
0ecdd98031
|
Add comments on accessing kv_cache and attn_metadata (#13887)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-26 18:41:02 +08:00 |
|
Cyrus Leung
|
7b700ec8c8
|
[Bugfix] Add test example for Ultravox v0.5 (#13890)
|
2025-02-26 02:31:43 -08:00 |
|
Roger Wang
|
7ca1da020f
|
[Misc] Fix input processing for Ultravox (#13871)
|
2025-02-25 23:56:34 -08:00 |
|
Jee Jee Li
|
5157338ed9
|
[Misc] Improve LoRA spelling (#13831)
|
2025-02-25 23:43:01 -08:00 |
|
Seth Kimmel
|
e206b54331
|
[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine (#13837)
Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>
|
2025-02-26 14:58:24 +08:00 |
|
Sage Moore
|
1d35662e6d
|
[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-02-26 14:56:58 +08:00 |
|
Albert
|
e656f638de
|
[Doc] fix the incorrect module path of tensorize_vllm_model (#13863)
|
2025-02-25 22:56:19 -08:00 |
|
Harry Mellor
|
145944cb94
|
Improve pipeline partitioning (#13839)
|
2025-02-25 18:53:56 -08:00 |
|
Henry Tsang
|
094b7d9496
|
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797)
|
2025-02-25 18:52:03 -08:00 |
|
Chenguang Li
|
e1fe7591f2
|
[Misc]Code Cleanup (#13859)
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
|
2025-02-26 10:44:30 +08:00 |
|
Lily Liu
|
5629f26df7
|
[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729)
|
2025-02-25 18:14:48 -08:00 |
|
Rui Qiao
|
9ba28043b5
|
[misc] Show driver IP info when Ray fails to allocate driver worker (#13858)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-26 09:53:43 +08:00 |
|
Harry Mellor
|
24679788ed
|
DeepSeek V2/V3/R1 only place lm_head on last pp rank (#13833)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-26 01:24:57 +00:00 |
|
Michael Goin
|
07c4353057
|
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-26 01:07:12 +00:00 |
|
Harry Mellor
|
34e3494e70
|
Fix failing MyGemma2Embedding test (#13820)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-25 12:33:03 -08:00 |
|
Liangfu Chen
|
f75aa72732
|
[Neuron] Add custom_ops for neuron backend (#13246)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: George Novack <gnovack@amazon.com>
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>
|
2025-02-25 11:47:49 -08:00 |
|
Chen1022
|
340e39e387
|
Fix string parsing error (#13825)
|
2025-02-25 08:20:29 -08:00 |
|
Cyrus Leung
|
f4133ce4e5
|
[Bugfix] Revert inspection code in #13743 (#13832)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-26 00:18:50 +08:00 |
|
Wen Sun
|
6522d55b6f
|
Fix /v1/audio/transcriptions Bad Request Error (#13811)
|
2025-02-25 06:03:33 -08:00 |
|
Isotr0py
|
6ff518626c
|
[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818)
|
2025-02-25 06:03:02 -08:00 |
|
Nichols A. Romero
|
fa82074167
|
[Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623)
Signed-off-by: Nichols A. Romero <nick.romero@amd.com>
|
2025-02-25 11:08:20 +00:00 |
|