Chengji Yao
|
0d49483ea9
|
[TPU] fix kv cache dtype in model runner (#19244)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-06 16:20:16 +08:00 |
|
Jinghui Zhang
|
90b78ec5f9
|
[v1][P/D] Fix a edge case in kv cache schedule (#19182)
Co-authored-by: jinghui <jinghui@fb.com>
|
2025-06-05 23:32:55 -07:00 |
|
Aaron Pham
|
91a2ef98ea
|
[Chore] update CODEOWNERS (#19247)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-06-06 06:09:43 +00:00 |
|
Xu Song
|
3da2313d78
|
Support allowed_token_ids in ChatCompletionRequest (#19143)
Signed-off-by: Xu Song <xusong.vip@gmail.com>
|
2025-06-06 05:06:48 +00:00 |
|
Chengji Yao
|
b61dc5f972
|
[TPU] update torch_xla pin (#19231)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-06 04:27:38 +00:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Benjamin Chislett
|
3465b87ef8
|
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-06-05 19:10:08 -07:00 |
|
Jerry Zhang
|
c8134bea15
|
Fix AOPerModuleConfig name changes (#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-06-05 18:51:32 -07:00 |
|
Luis Vega
|
cb6d572e85
|
[Model] NemotronH support (#18863)
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
|
2025-06-05 21:29:28 +00:00 |
|
Michael Goin
|
87360308b7
|
[V1] Use FlashInfer by default on Blackwell GPUs (#19118)
|
2025-06-05 15:40:39 -04:00 |
|
Dipika Sikka
|
aa49f14832
|
[Quantization] Skip Fp4 Test for compressed-tensors (#19217)
|
2025-06-05 18:21:53 +00:00 |
|
Nicolò Lucchesi
|
9ef9173cfa
|
[P/D][NixlConnector] Enable FlashInfer backend (#19090)
|
2025-06-05 17:10:15 +00:00 |
|
Povilas Kanapickas
|
85e2b7bb13
|
[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226)
Signed-off-by: Povilas Kanapickas <povilas@radix.lt>
|
2025-06-05 16:53:08 +00:00 |
|
Chiyue Wei
|
61059bee40
|
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
|
2025-06-05 09:48:26 -07:00 |
|
Xu Wenqing
|
ec89524f50
|
Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205)
|
2025-06-05 16:38:54 +00:00 |
|
Patrick von Platen
|
f20f9f063b
|
[mistral_common] Add v11 tokenizer (#19193)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-06-05 08:27:41 -07:00 |
|
Guillaume Calmettes
|
9bc8bb07cf
|
[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-06-05 12:59:28 +00:00 |
|
Reid
|
1aeb925f34
|
[Frontend] improve vllm run-batch --help display (#19187)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-05 11:16:25 +00:00 |
|
22quinn
|
188a4590d8
|
[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-05 11:14:32 +00:00 |
|
vllmellm
|
18093084be
|
[Misc] Remove unnecessary fallback to prefill-decode attention (#19138)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-06-05 16:08:26 +08:00 |
|
Simon Mo
|
da40380214
|
[Build] Annotate wheel and container path for release workflow (#19162)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-04 23:24:56 -07:00 |
|
Chauncey
|
8fc57501d3
|
[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-05 06:24:24 +00:00 |
|
Woosuk Kwon
|
af7fc84fd2
|
[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-05 13:41:25 +08:00 |
|
Huy Do
|
0678b52251
|
Handle non-serializable objects when dumping benchmark results (#19114)
|
2025-06-04 22:40:04 -07:00 |
|
Yang Wang
|
25b918eee6
|
[Torch Nightly]add missing dependency (#18770)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-06-04 21:56:12 -07:00 |
|
Michael Goin
|
a408820f2f
|
[Bugfix] Fix port handling in make_zmq_path (#19117)
|
2025-06-04 21:00:59 -06:00 |
|
Robert Shaw
|
c56ed8bb0e
|
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-05 02:07:32 +00:00 |
|
Reid
|
78dcf56cb3
|
[doc] small fix (#19167)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-05 09:13:50 +08:00 |
|
Nicolò Lucchesi
|
b2fac67130
|
[P/D] Heterogeneous TP (#18833)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-04 23:25:34 +00:00 |
|
CYJiang
|
23027e2daf
|
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (#18817)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-04 15:37:25 -07:00 |
|
Varun Sundar Rabindranath
|
c3fd4d669a
|
[Kernel] Integrate batched/masked deepgemm kernel (#19111)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-04 21:59:18 +00:00 |
|
Kebe
|
ef3f98b59f
|
[Bugfix] fix v1 cpu worker fails on macOS (#19121)
|
2025-06-04 20:17:38 +00:00 |
|
Siyuan Liu
|
7ee2590478
|
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 16:13:43 -04:00 |
|
Michael Goin
|
53a5a0ce30
|
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-04 10:46:28 -07:00 |
|
Tyler Michael Smith
|
d459fae0a2
|
[Bugfix][EP+DP] Fix internode check (#19112)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-06-04 23:39:23 +08:00 |
|
jmswen
|
c8dcc15921
|
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-04 08:26:47 -07:00 |
|
Cyrus Leung
|
8f4ffbd373
|
[Doc] Update V1 Guide for embedding models (#19141)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 22:57:55 +08:00 |
|
Lain
|
5f2cd251d2
|
Sm100 blockwise fp8 swap ab (#18564)
|
2025-06-04 07:48:45 -07:00 |
|
Xu Wenqing
|
02658c2dfe
|
Add DeepSeek-R1-0528 function call chat template (#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-06-04 13:24:18 +00:00 |
|
Cyrus Leung
|
01dc9a76db
|
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 04:49:20 -07:00 |
|
wang.yuqi
|
35cf32df30
|
Improve the output precision of embedding models (#19092)
|
2025-06-04 11:48:57 +00:00 |
|
Isotr0py
|
8711bc5e68
|
[Misc] Add packages for benchmark as extra dependency (#19089)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-04 04:18:48 -07:00 |
|
Seiji Eicher
|
2669a0d7b5
|
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-04 17:10:45 +08:00 |
|
Siyuan Liu
|
8e972d9c44
|
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 01:43:00 -07:00 |
|
汪志鹏
|
3336c8cfbe
|
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-04 01:42:06 -07:00 |
|
Woosuk Kwon
|
b124e1085b
|
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-03 23:10:15 -07:00 |
|
Kaixi Hou
|
41aa578428
|
[NVIDIA] Add Cutlass MLA backend (#17625)
|
2025-06-03 21:40:26 -07:00 |
|
Calvin Chen
|
8d646c2e53
|
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-04 04:23:26 +00:00 |
|
Vadim Gimpelson
|
5d6d1adf15
|
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
|
2025-06-03 21:13:01 -07:00 |
|
Lukas Geiger
|
1409ef9134
|
[Core] Cast multimodal input in hf processor (#18862)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-03 20:24:56 -07:00 |
|