Simon Mo
|
1068556b2c
|
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (#18579)
|
2025-05-23 07:43:58 -07:00 |
|
Reid
|
2cd1fa4556
|
[Misc] add Haystack integration (#18601)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-23 06:21:19 -07:00 |
|
Harry Mellor
|
d4c2919760
|
Include private attributes in API documentation (#18614)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 06:18:31 -07:00 |
|
Tristan Leclercq
|
6220f3c6b0
|
[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
|
2025-05-23 05:54:13 -07:00 |
|
Harry Mellor
|
52fb23f47e
|
Fix examples with code blocks in docs (#18609)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 05:53:44 -07:00 |
|
Cyrus Leung
|
6dd51c7ef1
|
[CI/Build] Fix V1 flag being set in entrypoints tests (#18598)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 05:51:53 -07:00 |
|
Harry Mellor
|
2edb533af2
|
Replace {func} with mkdocs style links (#18610)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 05:51:38 -07:00 |
|
Hyogeun Oh (오효근)
|
38a95cb4a8
|
[Doc] Fix indent of contributing to vllm (#18611)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-23 05:50:07 -07:00 |
|
Ning Xie
|
cd821ea5d2
|
[CI] fix kv_cache_type argument (#18594)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-23 04:49:18 -07:00 |
|
Kay Yan
|
7ab056c273
|
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt (#18542)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-05-23 04:38:42 -07:00 |
|
Harry Mellor
|
6526e05111
|
Add myself as docs code owner (#18605)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 04:08:31 -07:00 |
|
Madeesh Kannan
|
e493e48524
|
[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731)
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-23 03:38:23 -07:00 |
|
Mengqing Cao
|
4ce64e2df4
|
[Bugfix][Model] Fix baichuan model loader for tp (#18597)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-23 02:39:05 -07:00 |
|
Cyrus Leung
|
fbb13a2c15
|
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)" (#18600)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 02:18:22 -07:00 |
|
Harry Mellor
|
a1fe24d961
|
Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 02:09:53 -07:00 |
|
Yuqi Zhang
|
d0bc2f810b
|
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430)
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>
|
2025-05-23 01:41:37 -07:00 |
|
Chauncey
|
b046cf792d
|
[Feature][V1]: suupports cached_tokens in response usage (#18149)
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-05-23 01:41:03 -07:00 |
|
Michael Goin
|
54af915949
|
[Doc] Update quickstart and install for cu128 using --torch-backend=auto (#18505)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-23 08:36:37 +00:00 |
|
cascade
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
RonaldBXu
|
4c611348a7
|
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
|
2025-05-23 00:37:18 -07:00 |
|
Ning Xie
|
60cad94b86
|
[Hardware] correct method signatures for HPU,ROCm,XPU (#18551)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-22 22:31:59 -07:00 |
|
Shanshan Shen
|
9c1baa5bc6
|
[Misc] Replace cuda hard code with current_platform (#16983)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-05-23 04:38:50 +00:00 |
|
Teruaki Ishizaki
|
4be2255c81
|
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291)
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
|
2025-05-23 12:30:47 +08:00 |
|
aws-elaineyz
|
ed5d408255
|
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-22 21:26:32 -07:00 |
|
Lucas Wilkinson
|
2dc3b8b0a2
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 03:32:25 +00:00 |
|
Lucas Wilkinson
|
18bf91e6a8
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-23 03:31:49 +00:00 |
|
Benjamin Chislett
|
583507d130
|
[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-22 20:17:39 -07:00 |
|
lkchen
|
e44d8ce8c7
|
[Bugfix] Set KVTransferConfig.engine_id in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
|
Nick Hill
|
93ecb8139c
|
[BugFix] Increase TP execute_model timeout (#18558)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-23 10:22:11 +08:00 |
|
CYJiang
|
fae453f8ce
|
[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-05-23 10:15:32 +08:00 |
|
Harry Mellor
|
4b0da7b60e
|
Enable hybrid attention models for Transformers backend (#18494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 10:12:08 +08:00 |
|
Mark McLoughlin
|
c6b636f9fb
|
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-23 02:05:44 +00:00 |
|
Chenheli Hua
|
04eb88dc80
|
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-05-23 01:59:18 +00:00 |
|
rasmith
|
46791e1b4b
|
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-22 18:45:35 -07:00 |
|
Sanger Steel
|
c32e249a23
|
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-05-22 18:44:18 -07:00 |
|
Kai Wu
|
c91fe7b1b9
|
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917)
Signed-off-by: Kai Wu <kaiwu@meta.com>
|
2025-05-22 16:44:08 -07:00 |
|
Ekagra Ranjan
|
a04720bc36
|
[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290)
|
2025-05-22 15:17:33 -07:00 |
|
Lucas Wilkinson
|
00f526f55b
|
seperate gpu wait
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 21:52:27 +00:00 |
|
Lucas Wilkinson
|
a8439e2fd4
|
dp working no yields
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 21:49:14 +00:00 |
|
lkchen
|
7b9d832c80
|
[Tool] Add NIXL installation script (#18172)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-05-22 14:33:16 -07:00 |
|
Sage Moore
|
2a7f25fbe2
|
fix hang
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
9c60a6299d
|
tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
2259b47951
|
use vllm current_stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
04f11d97a0
|
working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
ffb740ae95
|
manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:36 +00:00 |
|
Sage Moore
|
020269c4c5
|
added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-22 20:51:36 +00:00 |
|
Lucas Wilkinson
|
9ccfd094ff
|
fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
f93bdd3151
|
support more args in dp example
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
df8f889f37
|
support MLA
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Lucas Wilkinson
|
37c9babaa0
|
enable naive microbatching
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|