Yannick Schnider
|
423330263b
|
[Feature] Pluggable platform-specific scheduler (#13161)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
|
2025-02-19 17:16:38 +08:00 |
|
Nick Hill
|
caf7ff4456
|
[V1][Core] Generic mechanism for handling engine utility (#13060)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-19 17:09:22 +08:00 |
|
Lucia Fang
|
f525c0be8b
|
[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-19 17:06:23 +08:00 |
|
Alex Brooks
|
983a40a8bb
|
[Bugfix] Fix Positive Feature Layers in Llava Models (#13514)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-19 08:50:07 +00:00 |
|
Zhe Zhang
|
fdc5df6f54
|
use device param in load_model method (#13037)
|
2025-02-19 16:05:02 +08:00 |
|
Kevin H. Luu
|
3b05cd4555
|
[perf-benchmark] Fix ECR path for premerge benchmark (#13512)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:56:11 +00:00 |
|
Kevin H. Luu
|
d5d214ac7f
|
[1/n][CI] Load models in CI from S3 instead of HF (#13205)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:34:59 +00:00 |
|
Roger Wang
|
fd84857f64
|
[Doc] Add clarification note regarding paligemma (#13511)
|
2025-02-18 22:24:03 -08:00 |
|
Divakar Verma
|
8aada19dfc
|
[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503)
|
2025-02-18 22:23:24 -08:00 |
|
Kevin H. Luu
|
9aa95b0e6a
|
[perf-benchmark] Allow premerge ECR (#13509)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 05:13:41 +00:00 |
|
Yu-Zhou
|
d0a7a2769d
|
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139)
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-18 19:40:19 -08:00 |
|
Harry Mellor
|
00b69c2d27
|
[Misc] Remove dangling references to --use-v2-block-manager (#13492)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-19 03:37:26 +00:00 |
|
Woosuk Kwon
|
4c82229898
|
[V1][Spec Decode] Optimize N-gram matching with Numba (#13365)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-18 13:19:58 -08:00 |
|
Woosuk Kwon
|
c8d70e2437
|
Pin Ray version to 2.40.0 (#13490)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-18 12:50:31 -08:00 |
|
Nick Hill
|
30172b4947
|
[V1] Optimize handling of sampling metadata and req_ids list (#13244)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-18 12:15:33 -08:00 |
|
Murali Andoorveedu
|
a4d577b379
|
[V1][Tests] Adding additional testing for multimodal models to V1 (#13308)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
|
2025-02-18 09:53:14 -08:00 |
|
youkaichao
|
7b203b7694
|
[misc] fix debugging code (#13487)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-18 09:37:11 -08:00 |
|
Woosuk Kwon
|
4fb8142a0e
|
[V1][PP] Enable true PP with Ray executor (#13472)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-18 09:15:32 -08:00 |
|
Daniele
|
a02c86b4dd
|
[CI/Build] migrate static project metadata from setup.py to pyproject.toml (#8772)
|
2025-02-18 08:02:49 -08:00 |
|
Liangfu Chen
|
3809458456
|
[Bugfix] Fix invalid rotary embedding unit test (#13431)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-02-18 11:52:03 +00:00 |
|
zifeitong
|
d3231cb436
|
[Bugfix] Handle content type with optional parameters (#13383)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2025-02-18 11:29:13 +00:00 |
|
Cyrus Leung
|
435b502a6e
|
[ROCm] Make amdsmi import optional for other platforms (#13460)
|
2025-02-18 03:15:56 -08:00 |
|
Isotr0py
|
29fc5772c4
|
[Bugfix] Remove noisy error logging during local model loading (#13458)
|
2025-02-18 03:15:48 -08:00 |
|
Harry Mellor
|
2358ca527b
|
[Doc]: Improve feature tables (#13224)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-18 18:52:39 +08:00 |
|
Isotr0py
|
8cf97f8661
|
[Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method (#13403)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-18 10:25:53 +00:00 |
|
Yuan Tang
|
e2603fefb8
|
[Bugfix] Ensure LoRA path from the request can be included in err msg (#13450)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-18 16:19:15 +08:00 |
|
Michael Goin
|
b53d79983c
|
Add outlines fallback when JSON schema has enum (#13449)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-18 06:49:41 +00:00 |
|
Woosuk Kwon
|
9915912f7f
|
[V1][PP] Fix & Pin Ray version in requirements-cuda.txt (#13436)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-17 21:58:06 -08:00 |
|
Kyle Sayers
|
d1b649f1ef
|
[Quant] Aria SupportsQuant (#13416)
|
2025-02-17 21:51:09 -08:00 |
|
youkaichao
|
ac19b519ed
|
[core] fix sleep mode in pytorch 2.6 (#13456)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-18 13:48:10 +08:00 |
|
Yuan Tang
|
a1074b3efe
|
[Bugfix] Only print out chat template when supplied (#13444)
|
2025-02-17 21:43:31 -08:00 |
|
Kyle Sayers
|
00294e1bc6
|
[Quant] Arctic SupportsQuant (#13366)
|
2025-02-17 21:35:09 -08:00 |
|
Kyle Sayers
|
88787bce1d
|
[Quant] Molmo SupportsQuant (#13336)
|
2025-02-17 21:34:47 -08:00 |
|
youkaichao
|
932b51cedd
|
[v1] fix parallel config rank (#13445)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-18 12:33:45 +08:00 |
|
Divakar Verma
|
7c7adf81fc
|
[ROCm] fix get_device_name for rocm (#13438)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-02-18 04:07:12 +00:00 |
|
Isotr0py
|
67ef8f666a
|
[Model] Enable quantization support for transformers backend (#12960)
|
2025-02-17 19:52:47 -08:00 |
|
Harry Mellor
|
efbe854448
|
[Misc] Remove dangling references to SamplingType.BEAM (#13402)
|
2025-02-17 19:52:35 -08:00 |
|
Tyler Michael Smith
|
b3942e157e
|
[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-18 00:32:48 +00:00 |
|
Woosuk Kwon
|
cd4a72a28d
|
[V1][Spec decode] Move drafter to model runner (#13363)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-17 15:40:12 -08:00 |
|
Cody Yu
|
6ac485a953
|
[V1][PP] Fix intermediate tensor values (#13417)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-17 13:37:45 -08:00 |
|
Woosuk Kwon
|
4c21ce9eba
|
[V1] Get input tokens from scheduler (#13339)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-17 11:01:07 -08:00 |
|
r.4ntix
|
ce77eb9410
|
[Bugfix] Fix VLLM_USE_MODELSCOPE issue (#13384)
|
2025-02-17 14:22:01 +00:00 |
|
Yan Ma
|
30513d1cb6
|
[Bugfix] fix xpu communicator (#13368)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-02-17 20:59:18 +08:00 |
|
Tyler Michael Smith
|
1f69c4a892
|
[Model] Support Mamba2 (Codestral Mamba) (#9292)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-02-17 20:17:50 +08:00 |
|
Cyrus Leung
|
7b623fca0b
|
[VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380)
|
2025-02-17 01:36:07 -08:00 |
|
Mengqing Cao
|
238dfc8ac3
|
[MISC] tiny fixes (#13378)
|
2025-02-17 00:57:13 -08:00 |
|
Huy Do
|
45186834a0
|
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-02-17 08:16:32 +00:00 |
|
yankooo
|
f857311d13
|
Fix spelling error in index.md (#13369)
|
2025-02-17 06:53:20 +00:00 |
|
shangmingc
|
46cdd59577
|
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-02-16 19:32:26 -08:00 |
|
Jee Jee Li
|
2010f04c17
|
[V1][Misc] Avoid unnecessary log output (#13289)
|
2025-02-16 19:26:24 -08:00 |
|