Kaixi Hou
|
27a09dc52c
|
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632)
|
2025-02-20 22:01:48 -08:00 |
|
Edwin Hernandez
|
981f3c831e
|
[Misc] Adding script to setup ray for multi-node vllm deployments (#12913)
|
2025-02-20 21:16:40 -08:00 |
|
Kante Yin
|
44c33f01f3
|
Add llmaz as another integration (#13643)
Signed-off-by: kerthcet <kerthcet@gmail.com>
|
2025-02-21 03:52:40 +00:00 |
|
Lingfan Yu
|
33170081f1
|
[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245)
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
|
2025-02-20 17:45:45 -08:00 |
|
Michael Goin
|
71face8540
|
[Bugfix] Fix max_num_batched_tokens for MLA (#13620)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-20 17:45:20 -08:00 |
|
Joe Runde
|
bfbc0b32c6
|
[Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-20 15:07:58 -05:00 |
|
ajayvohra2005
|
6a417b8600
|
fix neuron performance issue (#13589)
|
2025-02-20 10:59:36 -08:00 |
|
Woosuk Kwon
|
d3ea50113c
|
[V1][Minor] Print KV cache size in token counts (#13596)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-20 09:24:31 -08:00 |
|
Harry Mellor
|
34aad515c8
|
Update pre-commit's isort version to remove warnings (#13614)
|
2025-02-20 08:00:14 -08:00 |
|
chenxiaobing
|
ed6e9075d3
|
[Bugfix] Fix deepseekv3 grouped topk error (#13474)
Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn>
v0.7.3
|
2025-02-20 06:47:01 -08:00 |
|
Harry Mellor
|
992e5c3d34
|
Merge similar examples in offline_inference into single basic example (#12737)
|
2025-02-20 04:53:51 -08:00 |
|
Varun Sundar Rabindranath
|
b69692a2d8
|
[Kernel] LoRA - Refactor sgmv kernels (#13110)
|
2025-02-20 07:28:06 -05:00 |
|
Kevin H. Luu
|
a64a84433d
|
[2/n][ci] S3: Use full model path (#13564)
Signed-off-by: <>
|
2025-02-20 01:20:15 -08:00 |
|
Kevin H. Luu
|
aa1e62d0db
|
[ci] Fix spec decode test (#13600)
|
2025-02-20 16:56:00 +08:00 |
|
Michael Goin
|
497bc83124
|
[CI/Build] Use uv in the Dockerfile (#13566)
|
2025-02-19 23:05:44 -08:00 |
|
Yuan Tang
|
3738e6fa80
|
[API Server] Add port number range validation (#13506)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-20 15:05:13 +08:00 |
|
Gregory Shtrasberg
|
0023cd2b9d
|
[ROCm] MI300A compile targets deprecation (#13560)
|
2025-02-19 23:05:00 -08:00 |
|
燃
|
041e294716
|
[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533)
|
2025-02-19 23:04:30 -08:00 |
|
Alex Brooks
|
9621667874
|
[Misc] Warn if the vLLM version can't be retrieved (#13501)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-20 06:24:48 +00:00 |
|
Simon Mo
|
8c755c3b6d
|
[bugfix] spec decode worker get tp group only when initialized (#13578)
|
2025-02-20 04:46:28 +00:00 |
|
youkaichao
|
ba81163997
|
[core] add sleep and wake up endpoint and v1 support (#12987)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
|
2025-02-20 12:41:17 +08:00 |
|
Divakar Verma
|
0d243f2a54
|
[ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-02-20 04:01:02 +00:00 |
|
Kevin H. Luu
|
88f6ba3281
|
[ci] Add AWS creds for AMD (#13572)
|
2025-02-20 03:56:06 +00:00 |
|
Jee Jee Li
|
512368e34a
|
[Misc] Qwen2.5 VL support LoRA (#13261)
|
2025-02-19 18:37:55 -08:00 |
|
Kevin H. Luu
|
473f51cfd9
|
[3/n][CI] Load Quantization test models with S3 (#13570)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-20 10:12:30 +08:00 |
|
Nick Hill
|
a4c402a756
|
[BugFix] Avoid error traceback in logs when V1 LLM terminates (#13565)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-20 00:49:01 +00:00 |
|
Isotr0py
|
550d97eb58
|
[Misc] Avoid calling unnecessary hf_list_repo_files for local model path (#13348)
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-02-19 18:57:48 +00:00 |
|
Cody Yu
|
fbbe1fbac6
|
[MISC] Logging the message about Ray teardown (#13502)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
|
2025-02-19 09:40:50 -08:00 |
|
Wilson Wu
|
01c184b8f3
|
Fix copyright year to auto get current year (#13561)
|
2025-02-19 16:55:34 +00:00 |
|
youkaichao
|
ad5a35c21b
|
[doc] clarify multi-node serving doc (#13558)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-19 22:32:17 +08:00 |
|
shangmingc
|
5ae9f26a5a
|
[Bugfix] Fix device ordinal for multi-node spec decode (#13269)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-02-19 22:13:15 +08:00 |
|
Cyrus Leung
|
377d10bd14
|
[VLM][Bugfix] Pass processor kwargs properly on init (#13516)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-19 13:13:50 +00:00 |
|
youkaichao
|
52ce14d31f
|
[doc] clarify profiling is only for developers (#13554)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-19 20:55:58 +08:00 |
|
Daniele
|
81dabf24a8
|
[CI/Build] force writing version file (#13544)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2025-02-19 18:48:03 +08:00 |
|
Yannick Schnider
|
423330263b
|
[Feature] Pluggable platform-specific scheduler (#13161)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
|
2025-02-19 17:16:38 +08:00 |
|
Nick Hill
|
caf7ff4456
|
[V1][Core] Generic mechanism for handling engine utility (#13060)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-19 17:09:22 +08:00 |
|
Lucia Fang
|
f525c0be8b
|
[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-19 17:06:23 +08:00 |
|
Alex Brooks
|
983a40a8bb
|
[Bugfix] Fix Positive Feature Layers in Llava Models (#13514)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-19 08:50:07 +00:00 |
|
Zhe Zhang
|
fdc5df6f54
|
use device param in load_model method (#13037)
|
2025-02-19 16:05:02 +08:00 |
|
Kevin H. Luu
|
3b05cd4555
|
[perf-benchmark] Fix ECR path for premerge benchmark (#13512)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:56:11 +00:00 |
|
Kevin H. Luu
|
d5d214ac7f
|
[1/n][CI] Load models in CI from S3 instead of HF (#13205)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:34:59 +00:00 |
|
Roger Wang
|
fd84857f64
|
[Doc] Add clarification note regarding paligemma (#13511)
|
2025-02-18 22:24:03 -08:00 |
|
Divakar Verma
|
8aada19dfc
|
[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503)
|
2025-02-18 22:23:24 -08:00 |
|
Kevin H. Luu
|
9aa95b0e6a
|
[perf-benchmark] Allow premerge ECR (#13509)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 05:13:41 +00:00 |
|
Yu-Zhou
|
d0a7a2769d
|
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139)
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-18 19:40:19 -08:00 |
|
Harry Mellor
|
00b69c2d27
|
[Misc] Remove dangling references to --use-v2-block-manager (#13492)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-19 03:37:26 +00:00 |
|
Woosuk Kwon
|
4c82229898
|
[V1][Spec Decode] Optimize N-gram matching with Numba (#13365)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-18 13:19:58 -08:00 |
|
Woosuk Kwon
|
c8d70e2437
|
Pin Ray version to 2.40.0 (#13490)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-18 12:50:31 -08:00 |
|
Nick Hill
|
30172b4947
|
[V1] Optimize handling of sampling metadata and req_ids list (#13244)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-18 12:15:33 -08:00 |
|
Murali Andoorveedu
|
a4d577b379
|
[V1][Tests] Adding additional testing for multimodal models to V1 (#13308)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
|
2025-02-18 09:53:14 -08:00 |
|