Isotr0py
ada7c780d5
[Misc] Fix yapf linting tools etc not running on pre-commit ( #13695 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-22 13:10:43 +08:00
Lucas Wilkinson
288cc6c234
[Attention] MLA with chunked prefill ( #12639 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
John Zheng
900edbfa48
fix typo of grafana dashboard, with correct datasource ( #13668 )
...
Signed-off-by: John Zheng <john.zheng@hp.com>
2025-02-21 18:21:05 +00:00
Isotr0py
b2c3fc5d65
[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation ( #13586 )
2025-02-20 22:24:17 -08:00
leoneo
839b27c6cc
[Kernel]Add streamK for block-quantized CUTLASS kernels ( #12978 )
2025-02-20 22:14:24 -08:00
Kevin H. Luu
34ad27fe83
[ci] Fix metrics test model path ( #13635 )
2025-02-20 22:12:10 -08:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models ( #12846 )
2025-02-20 22:09:47 -08:00
Szymon Ożóg
1cdc88614a
Missing comment explaining VDR variable in GGUF kernels ( #13290 )
2025-02-20 22:06:54 -08:00
Nick Hill
31aa045c11
[V1][Sampler] Avoid an operation during temperature application ( #13587 )
2025-02-20 22:05:56 -08:00
Roger Wang
a30c093502
[Bugfix] Add mm_processor_kwargs to chat-related protocols ( #13644 )
2025-02-20 22:04:33 -08:00
Harry Mellor
c7b07a95a6
Use pre-commit to update requirements-test.txt ( #13617 )
2025-02-20 22:03:27 -08:00
Kaixi Hou
27a09dc52c
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant ( #13632 )
2025-02-20 22:01:48 -08:00
Edwin Hernandez
981f3c831e
[Misc] Adding script to setup ray for multi-node vllm deployments ( #12913 )
2025-02-20 21:16:40 -08:00
Kante Yin
44c33f01f3
Add llmaz as another integration ( #13643 )
...
Signed-off-by: kerthcet <kerthcet@gmail.com>
2025-02-21 03:52:40 +00:00
Lingfan Yu
33170081f1
[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth ( #13245 )
...
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
2025-02-20 17:45:45 -08:00
Michael Goin
71face8540
[Bugfix] Fix max_num_batched_tokens for MLA ( #13620 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-20 17:45:20 -08:00
Joe Runde
bfbc0b32c6
[Frontend] Add backend-specific options for guided decoding ( #13505 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
ajayvohra2005
6a417b8600
fix neuron performance issue ( #13589 )
2025-02-20 10:59:36 -08:00
Woosuk Kwon
d3ea50113c
[V1][Minor] Print KV cache size in token counts ( #13596 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-20 09:24:31 -08:00
Harry Mellor
34aad515c8
Update pre-commit's isort version to remove warnings ( #13614 )
2025-02-20 08:00:14 -08:00
chenxiaobing
ed6e9075d3
[Bugfix] Fix deepseekv3 grouped topk error ( #13474 )
...
Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn>
v0.7.3
2025-02-20 06:47:01 -08:00
Harry Mellor
992e5c3d34
Merge similar examples in offline_inference into single basic example ( #12737 )
2025-02-20 04:53:51 -08:00
Varun Sundar Rabindranath
b69692a2d8
[Kernel] LoRA - Refactor sgmv kernels ( #13110 )
2025-02-20 07:28:06 -05:00
Kevin H. Luu
a64a84433d
[2/n][ci] S3: Use full model path ( #13564 )
...
Signed-off-by: <>
2025-02-20 01:20:15 -08:00
Kevin H. Luu
aa1e62d0db
[ci] Fix spec decode test ( #13600 )
2025-02-20 16:56:00 +08:00
Michael Goin
497bc83124
[CI/Build] Use uv in the Dockerfile ( #13566 )
2025-02-19 23:05:44 -08:00
Yuan Tang
3738e6fa80
[API Server] Add port number range validation ( #13506 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-20 15:05:13 +08:00
Gregory Shtrasberg
0023cd2b9d
[ROCm] MI300A compile targets deprecation ( #13560 )
2025-02-19 23:05:00 -08:00
燃
041e294716
[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL ( #13533 )
2025-02-19 23:04:30 -08:00
Alex Brooks
9621667874
[Misc] Warn if the vLLM version can't be retrieved ( #13501 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-02-20 06:24:48 +00:00
Simon Mo
8c755c3b6d
[bugfix] spec decode worker get tp group only when initialized ( #13578 )
2025-02-20 04:46:28 +00:00
youkaichao
ba81163997
[core] add sleep and wake up endpoint and v1 support ( #12987 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
2025-02-20 12:41:17 +08:00
Divakar Verma
0d243f2a54
[ROCm][MoE] mi300 mixtral8x7B perf for specific BS ( #13577 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-02-20 04:01:02 +00:00
Kevin H. Luu
88f6ba3281
[ci] Add AWS creds for AMD ( #13572 )
2025-02-20 03:56:06 +00:00
Jee Jee Li
512368e34a
[Misc] Qwen2.5 VL support LoRA ( #13261 )
2025-02-19 18:37:55 -08:00
Kevin H. Luu
473f51cfd9
[3/n][CI] Load Quantization test models with S3 ( #13570 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-20 10:12:30 +08:00
Nick Hill
a4c402a756
[BugFix] Avoid error traceback in logs when V1 LLM terminates ( #13565 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-20 00:49:01 +00:00
Isotr0py
550d97eb58
[Misc] Avoid calling unnecessary hf_list_repo_files for local model path ( #13348 )
...
Signed-off-by: isotr0py <2037008807@qq.com>
2025-02-19 18:57:48 +00:00
Cody Yu
fbbe1fbac6
[MISC] Logging the message about Ray teardown ( #13502 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
2025-02-19 09:40:50 -08:00
Wilson Wu
01c184b8f3
Fix copyright year to auto get current year ( #13561 )
2025-02-19 16:55:34 +00:00
youkaichao
ad5a35c21b
[doc] clarify multi-node serving doc ( #13558 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-19 22:32:17 +08:00
shangmingc
5ae9f26a5a
[Bugfix] Fix device ordinal for multi-node spec decode ( #13269 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-02-19 22:13:15 +08:00
Cyrus Leung
377d10bd14
[VLM][Bugfix] Pass processor kwargs properly on init ( #13516 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-19 13:13:50 +00:00
youkaichao
52ce14d31f
[doc] clarify profiling is only for developers ( #13554 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-19 20:55:58 +08:00
Daniele
81dabf24a8
[CI/Build] force writing version file ( #13544 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-02-19 18:48:03 +08:00
Yannick Schnider
423330263b
[Feature] Pluggable platform-specific scheduler ( #13161 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
2025-02-19 17:16:38 +08:00
Nick Hill
caf7ff4456
[V1][Core] Generic mechanism for handling engine utility ( #13060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-19 17:09:22 +08:00
Lucia Fang
f525c0be8b
[Model][Speculative Decoding] DeepSeek MTP spec decode ( #12755 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-02-19 17:06:23 +08:00
Alex Brooks
983a40a8bb
[Bugfix] Fix Positive Feature Layers in Llava Models ( #13514 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-02-19 08:50:07 +00:00
Zhe Zhang
fdc5df6f54
use device param in load_model method ( #13037 )
2025-02-19 16:05:02 +08:00