Harry Mellor
|
5a0905ba2a
|
Replace misc issues with link to forum (#15226)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 23:18:20 +08:00 |
|
Richard Liu
|
a8f12a63fd
|
Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-20 14:59:33 +00:00 |
|
Harry Mellor
|
69ae2380c6
|
Add user forum to README (#15220)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-20 22:39:51 +08:00 |
|
Cyrus Leung
|
27261e40a6
|
[Bugfix] Multi-video inference on LLaVA-Onevision (#15082)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 14:10:45 +00:00 |
|
Quang-Linh LE
|
e3f813c33b
|
[macOS] Ugrade pytorch to 2.6.0 (#15129)
|
2025-03-20 01:22:40 -07:00 |
|
Wang Ran (汪然)
|
c607a2652b
|
Fixing Imprecise Type Annotations (#15192)
|
2025-03-20 01:19:55 -07:00 |
|
Kevin H. Luu
|
3d45e3d749
|
[release] Tag vllm-cpu with latest upon new version released (#15193)
|
2025-03-20 01:19:10 -07:00 |
|
billishyahao
|
742369d35a
|
[Frontend][Bugfix] support prefill decode disaggregation on deepseek (#14824)
Signed-off-by: billishyahao <bill.he@amd.com>
Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com>
|
2025-03-20 00:00:33 -07:00 |
|
Wang Ran (汪然)
|
bfe2fe0af4
|
typo: Update config.py (#15189)
|
2025-03-19 23:31:21 -07:00 |
|
Matt Ritter
|
a8652f4f0f
|
Enable CUDA graph support for llama 3.2 vision (#14917)
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>
|
2025-03-19 23:29:16 -07:00 |
|
Cyrus Leung
|
2f726b241e
|
[Doc] Update README.md (#15187)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-20 13:25:58 +08:00 |
|
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
|
Chauncey
|
ae65f3e237
|
[Misc]fixed disable these http request logs (#14754)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-19 21:53:40 -07:00 |
|
Roger Wang
|
34868b106a
|
[Doc] Update Mistral Small 3.1/Pixtral example (#15184)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-20 04:46:06 +00:00 |
|
Russell Bryant
|
1f16b7fe74
|
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-19 21:33:51 -07:00 |
|
Jennifer Zhao
|
b88be22165
|
[Benchmark] Allow oversample request in benchmark dataset (#15170)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
|
2025-03-20 12:32:58 +08:00 |
|
Nicolò Lucchesi
|
d8c6d7d6b5
|
[V1][TPU] Support V1 Sampler for ragged attention (#14227)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-19 21:00:39 -07:00 |
|
Wang, Yi
|
40828ce5fe
|
fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
2025-03-19 20:56:16 -07:00 |
|
Cyrus Leung
|
ffa443afed
|
[Bugfix] Fix embedding assignment for InternVL-based models (#15086)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-20 03:40:13 +00:00 |
|
Jovan Sardinha
|
70e500cad9
|
Fix broken tests (#14713)
Signed-off-by: JovanSardinha <jovan.sardinha@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-03-20 02:06:49 +00:00 |
|
Rui Qiao
|
4cb1c05c9e
|
[Doc] Clarify run vllm only on one node in distributed inference (#15148)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-03-20 09:55:59 +08:00 |
|
Nick Hill
|
c47aafa37c
|
[BugFix] Lazily import XgrammarBackend to avoid early cuda init (#15171)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 01:30:43 +00:00 |
|
Alexander Matveev
|
cfbca8a2f2
|
[V1] TPU - Tensor parallel MP support (#15059)
|
2025-03-20 00:55:18 +00:00 |
|
Simon Mo
|
0fe5609874
|
[Docs] Annouce Ollama and Singapore Meetups (#15161)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-19 16:18:04 -07:00 |
|
Nick Hill
|
22d33baca2
|
[FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests (#15150)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-19 21:04:41 +00:00 |
|
iefgnoix
|
b0e96aaebb
|
[V1][TPU] Change kv cache shape. (#15145)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-19 12:16:42 -07:00 |
|
Wang Ran (汪然)
|
8310e0b59b
|
simple bugfix: Update stats.py (#15139)
|
2025-03-19 18:26:27 +00:00 |
|
maobaolong
|
26dd972adb
|
[FEAT]Support reset prefix cache by specified device (#15003)
|
2025-03-19 10:54:41 -07:00 |
|
Murali Andoorveedu
|
61c7a1b856
|
[V1] Minor V1 async engine test refactor (#15075)
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
v0.8.1
|
2025-03-19 10:37:17 -07:00 |
|
Alessandro Sangiorgi
|
374ee287d8
|
[Frontend] Remove custom_cache_manager (#13791)
Signed-off-by: fulvius31 <asangior@redhat.com>
|
2025-03-20 00:13:50 +08:00 |
|
Kero Liang
|
a4d83661d7
|
[Misc] Update the "the first vLLM China Meetup" slides link to point to the first page (#15134)
Signed-off-by: imkero <kerorek@outlook.com>
|
2025-03-19 15:07:39 +00:00 |
|
Jan Kaniecki
|
8363cd093d
|
[Bugfix] Adjust mllama to regional compilation (#15112)
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
|
2025-03-19 07:57:25 -07:00 |
|
Aaron Pham
|
6c5a3195db
|
[Misc][Benchmark] Add support for different tokenizer_mode (#15040)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-19 14:56:50 +00:00 |
|
Marc-Alexandre Côté
|
073d1ed354
|
[Doc] Update tip info on using latest transformers when creating a custom Dockerfile (#15070)
|
2025-03-19 13:33:40 +00:00 |
|
Cyrus Leung
|
3d446433ec
|
[Bugfix] Fix size calculation of processing cache (#15114)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 05:53:19 -07:00 |
|
Cyrus Leung
|
1fe0fd12d3
|
[Misc] Avoid unnecessary HF do_rescale warning when passing dummy data (#15107)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 03:42:31 -07:00 |
|
Roger Wang
|
dafb4e504a
|
[V1][Bugfix] Fix oracle for device checking (#15104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-19 18:35:32 +08:00 |
|
Kunshang Ji
|
68cf1601d3
|
[CI][Intel GPU] update XPU dockerfile and CI script (#15109)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-03-19 01:29:25 -07:00 |
|
Cyrus Leung
|
61f412187d
|
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 23:58:22 -07:00 |
|
Woosuk Kwon
|
05ccd0aa35
|
[V1] Ensure using int64 for sampled token ids (#15065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 23:52:19 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Brayden Zhong
|
8b3e94a357
|
[Model] Remove duplicated message check in Mistral chat completion request (#15069)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-19 05:09:32 +00:00 |
|
Julien Denize
|
437f9162d0
|
[Model] Pixtral: Remove layer instantiation duplication (#15053)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-03-19 10:34:03 +08:00 |
|
Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|
Jennifer Zhao
|
228b768db6
|
[Doc] Minor v1_user_guide update (#15064)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-18 16:10:45 -07:00 |
|
Chujie Zheng
|
027827cc1d
|
fix long dtype in topk sampling (#15049)
|
2025-03-18 15:57:31 -07:00 |
|
Alexander Matveev
|
72a8639b68
|
[V1] TPU - CI/CD use smaller model (#15054)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-18 21:39:21 +00:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Russell Bryant
|
3a1e648158
|
[V1] Refactor Structured Output for multiple backends (#14694)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-18 19:49:15 +00:00 |
|
Jee Jee Li
|
46c759c165
|
[Bugfix] Fix LoRA extra vocab size (#15047)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 09:40:29 -07:00 |
|