Tyler Michael Smith
|
1f69c4a892
|
[Model] Support Mamba2 (Codestral Mamba) (#9292)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-02-17 20:17:50 +08:00 |
|
Cyrus Leung
|
7b623fca0b
|
[VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380)
|
2025-02-17 01:36:07 -08:00 |
|
Mengqing Cao
|
238dfc8ac3
|
[MISC] tiny fixes (#13378)
|
2025-02-17 00:57:13 -08:00 |
|
Huy Do
|
45186834a0
|
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-02-17 08:16:32 +00:00 |
|
yankooo
|
f857311d13
|
Fix spelling error in index.md (#13369)
|
2025-02-17 06:53:20 +00:00 |
|
shangmingc
|
46cdd59577
|
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-02-16 19:32:26 -08:00 |
|
Jee Jee Li
|
2010f04c17
|
[V1][Misc] Avoid unnecessary log output (#13289)
|
2025-02-16 19:26:24 -08:00 |
|
Woosuk Kwon
|
69e1d23e1e
|
[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-16 12:25:29 -08:00 |
|
Isotr0py
|
d67cc21b78
|
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-16 18:55:27 +00:00 |
|
Woosuk Kwon
|
e18227b04a
|
[V1][PP] Cache Intermediate Tensors (#13353)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-16 10:02:27 -08:00 |
|
Woosuk Kwon
|
7b89386553
|
[V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-16 09:39:08 -08:00 |
|
凌
|
da833b0aee
|
[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325)
|
2025-02-16 16:04:21 +00:00 |
|
Cyrus Leung
|
5d2965b7d7
|
[Bugfix] Fix 2 Node and Spec Decode tests (#13341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-16 22:20:22 +08:00 |
|
youkaichao
|
a0231b7c25
|
[platform] add base class for communicators (#13208)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-16 22:14:22 +08:00 |
|
youkaichao
|
124776ebd5
|
[ci] skip failed tests for flashinfer (#13352)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-16 22:09:15 +08:00 |
|
Roger Wang
|
b7d309860e
|
[V1] Update doc and examples for H2O-VL (#13349)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-02-16 10:35:54 +00:00 |
|
wchen61
|
dc0f7ccf8b
|
[BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187)
Signed-off-by: wchen61 <wchen61@foxmail.com>
|
2025-02-16 08:59:49 +00:00 |
|
Michael Goin
|
d3d547e057
|
[Bugfix] Pin xgrammar to 0.1.11 (#13338)
|
2025-02-15 19:42:25 -08:00 |
|
Kyle Sayers
|
12913d17ba
|
[Quant] Add SupportsQuant to phi3 and clip (#13104)
|
2025-02-15 19:28:33 -08:00 |
|
Lily Liu
|
80f63a3966
|
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-15 18:05:11 -08:00 |
|
Cyrus Leung
|
367cb8ce8c
|
[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331)
|
2025-02-15 07:06:23 -08:00 |
|
youkaichao
|
54ed913f34
|
[ci/build] update flashinfer (#13323)
|
2025-02-15 05:33:13 -08:00 |
|
Cody Yu
|
9206b3d7ec
|
[V1][PP] Run engine busy loop with batch queue (#13064)
|
2025-02-15 03:59:01 -08:00 |
|
rasmith
|
ed0de3e4b8
|
[AMD] [Model] DeepSeek tunings (#13199)
|
2025-02-15 03:58:09 -08:00 |
|
Mark McLoughlin
|
2ad1bc7afe
|
[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288)
|
2025-02-15 03:56:19 -08:00 |
|
Isotr0py
|
7fdaaf48ef
|
[Bugfix] Fix qwen2.5-vl image processor (#13286)
|
2025-02-15 03:00:11 -08:00 |
|
Xu Song
|
067fa2255b
|
[Bugfix]Fix search start_index of stop_checker (#13280)
|
2025-02-14 21:39:42 -08:00 |
|
Nick Hill
|
9076325677
|
[BugFix] Don't scan entire cache dir when loading model (#13302)
|
2025-02-14 21:33:31 -08:00 |
|
Tyler Michael Smith
|
97a3d6d995
|
[Bugfix] Massage MLA's usage of flash attn for RoCM (#13310)
|
2025-02-14 21:33:25 -08:00 |
|
Nicolò Lucchesi
|
579d7a63b2
|
[Bugfix][Docs] Fix offline Whisper (#13274)
|
2025-02-14 21:32:37 -08:00 |
|
Sage Moore
|
c9f9d5b397
|
[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235)
|
2025-02-14 20:30:42 -08:00 |
|
Woosuk Kwon
|
0c73026844
|
[V1][PP] Fix memory profiling in PP (#13315)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-14 20:17:25 -08:00 |
|
Nick Hill
|
6a854c7a2b
|
[V1][Sampler] Don't apply temp for greedy-only (#13311)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-14 18:10:53 -08:00 |
|
Woosuk Kwon
|
e7eea5a520
|
[V1][CI] Fix failed v1-test because of min_p (#13316)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-14 17:29:51 -08:00 |
|
Aoyu
|
a12934d3ec
|
[V1][Core] min_p sampling support (#13191)
Signed-off-by: Aoyu <aoyuzhan@amazon.com>
Co-authored-by: Aoyu <aoyuzhan@amazon.com>
|
2025-02-14 15:50:05 -08:00 |
|
Joe Runde
|
3bcb8c75da
|
[Core] Reduce TTFT with concurrent partial prefills (#10235)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-14 15:36:07 -08:00 |
|
Michael Goin
|
5e5c8e091e
|
[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-14 12:53:42 -08:00 |
|
Yu-Zhou
|
c9e2d644e7
|
[Hardware][Gaudi][Bugfix] Fix error for guided decoding (#12317)
|
2025-02-14 04:36:49 -08:00 |
|
Russell Bryant
|
7734e9a291
|
[Core] choice-based structured output with xgrammar (#12632)
|
2025-02-14 04:36:05 -08:00 |
|
Lu Fang
|
6224a9f620
|
Support logit_bias in v1 Sampler (#13079)
|
2025-02-14 04:34:59 -08:00 |
|
Nick Hill
|
085b7b2d6c
|
[V1] Simplify GPUModelRunner._update_states check (#13265)
|
2025-02-14 04:33:43 -08:00 |
|
Cyrus Leung
|
4da1f667e9
|
[VLM] Keep track of whether prompt replacements have been applied (#13215)
|
2025-02-14 04:20:46 -08:00 |
|
Jun Duan
|
556ef7f714
|
[Misc] Log time consumption of sleep and wake-up (#13115)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
|
2025-02-14 20:10:21 +08:00 |
|
Xu Song
|
83481ceb49
|
[Bugfix] Fix missing parentheses (#13263)
|
2025-02-14 01:07:10 -08:00 |
|
Pooya Davoodi
|
185cc19f92
|
[Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch (#12927)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2025-02-14 08:22:42 +00:00 |
|
Alexander Matveev
|
45f90bcbba
|
[WIP] TPU V1 Support Refactored (#13049)
|
2025-02-14 00:21:53 -08:00 |
|
Kero Liang
|
b0ccfc565a
|
[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126)
|
2025-02-13 22:39:20 -08:00 |
|
Sage Moore
|
ba59b78a9c
|
[ROCm][V1] Add intial ROCm support to V1 (#12790)
|
2025-02-13 22:21:50 -08:00 |
|
Varun Sundar Rabindranath
|
cbc40128eb
|
[V1] LoRA - Enable Serving Usecase (#12883)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-02-14 14:21:12 +08:00 |
|
Michael Goin
|
f0b2da72a8
|
Expand MLA to support most types of quantization (#13181)
|
2025-02-13 22:19:22 -08:00 |
|