Richard Zou
|
ea2236bf95
|
Add option to use torch._inductor.standalone_compile (#17057)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-09 12:59:04 -07:00 |
|
Harry Mellor
|
7d4aedae7c
|
Handle error when str passed to /v1/audio/transcriptions (#17909)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 19:23:59 +00:00 |
|
Michael Goin
|
22481fbfa3
|
Update CT WNA16MarlinMoE integration (#16666)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-09 13:19:45 -04:00 |
|
Isotr0py
|
5c4c08f6f1
|
[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-09 17:16:12 +00:00 |
|
Rui Qiao
|
c44c384b1c
|
[Misc] Add references in ray_serve_deepseek example (#17907)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-05-09 16:59:36 +00:00 |
|
Michael Goin
|
85b72cb7b1
|
Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910)
|
2025-05-09 08:58:18 -07:00 |
|
Cyrus Leung
|
6e5595ca39
|
[CI/Build] Automatically retry flaky tests (#17856)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-09 09:55:17 -06:00 |
|
Chen Zhang
|
200da9a517
|
[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-09 15:25:34 +00:00 |
|
qli88
|
9f64e93415
|
[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-05-09 08:59:36 -06:00 |
|
Reid
|
ec61ea20a8
|
[Misc] add dify integration (#17895)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-09 03:42:39 -07:00 |
|
Harry Mellor
|
c6798baa9c
|
Change top_k to be disabled with 0 (still accept -1 for now) (#17773)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 10:01:49 +00:00 |
|
inkcherry
|
5b2dcbf0b8
|
Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config (#17853)
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
|
2025-05-09 09:16:26 +00:00 |
|
Isotr0py
|
6e4a93e3f7
|
[Bugfix][CPU] Fix broken AVX2 CPU TP support (#17252)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-09 08:55:14 +00:00 |
|
vllmellm
|
217db4baa6
|
[Bugfix][ROCm] Fix AITER MLA V1 (#17880)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-09 08:38:21 +00:00 |
|
Yan Ma
|
ff8c400502
|
[Doc] remove visible token in doc (#17884)
Signed-off-by: yan <yanma1@habana.ai>
|
2025-05-09 01:21:31 -07:00 |
|
Michael Yao
|
89a0315f4c
|
[Doc] Update several links in reasoning_outputs.md (#17846)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-05-09 01:20:55 -07:00 |
|
Simon Mo
|
3d1e387652
|
[Docs] Add Slides from NYC Meetup (#17879)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-05-08 21:46:54 -07:00 |
|
Ning Xie
|
d310e6de98
|
[BUGFIX]: return fast when request requires prompt logprobs (#17251)
|
2025-05-08 21:25:41 -07:00 |
|
Lucas Wilkinson
|
5e6f939484
|
[Attention] MLA move rotary embedding to cuda-graph region (#17668)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-09 11:14:42 +08:00 |
|
Shanshan Shen
|
760e3ecc8f
|
[V1][Structured Output] Update llguidance (>= 0.7.11) to avoid AttributeError (no StructTag) (#17839)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-05-08 20:14:18 -07:00 |
|
vllmellm
|
3c9396a64f
|
[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
|
2025-05-09 10:42:05 +08:00 |
|
Shu Wang
|
376786fac1
|
Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-05-08 15:09:55 -07:00 |
|
Michael Goin
|
4f605a6de5
|
Fix noisy warning for uncalibrated q_scale/p_scale (#17414)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-08 15:56:59 -04:00 |
|
Michael Goin
|
8342e3abd1
|
[CI] Prune down lm-eval small tests (#17012)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-08 19:00:26 +00:00 |
|
yarongmu-google
|
a83a0f92b5
|
[Test] Attempt all TPU V1 tests, even if some of them fail. (#17334)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-05-08 17:20:54 +00:00 |
|
Russell Bryant
|
226a4272cf
|
[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging (#17860)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-08 16:57:35 +00:00 |
|
Russell Bryant
|
ec54d73c31
|
[CI] Fix test_collective_rpc (#17858)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-08 16:47:12 +00:00 |
|
Jee Jee Li
|
a944f8ede7
|
[Misc] Delete LoRA-related redundancy code (#17841)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-08 06:02:21 -07:00 |
|
Cyrus Leung
|
015815fe01
|
[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor (#17838)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 05:39:21 -07:00 |
|
Harry Mellor
|
e4ca6e3a99
|
Fix transient dependency error in docs build (#17848)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-08 03:42:03 -07:00 |
|
Reid
|
53d0cb7423
|
[Misc] add chatbox integration (#17828)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-08 10:05:26 +00:00 |
|
Lu Fang
|
f50dcb7c21
|
[Easy] Eliminate c10::optional usage in vllm/csrc (#17819)
|
2025-05-08 03:05:10 -07:00 |
|
Cyrus Leung
|
a1e19b635d
|
[Doc] Fix a typo in the file name (#17836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 18:04:18 +08:00 |
|
fxmarty-amd
|
bb239a730f
|
[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
|
2025-05-08 02:53:53 -07:00 |
|
Jevin Jiang
|
a463555dee
|
[TPU] Fix the test_sampler (#17820)
|
2025-05-08 05:51:33 -04:00 |
|
Rick Yuan
|
ca04b97c93
|
[Bugfix] Fix tool call template validation for Mistral models (#17644)
Signed-off-by: Rick Yuan <yuan821120@gmail.com>
Signed-off-by: RIck Yuan <yuan821120@gmail.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-05-08 09:47:19 +00:00 |
|
xsank
|
0a9bbaa104
|
[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763)
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
|
2025-05-08 07:50:22 +00:00 |
|
Qiong Zhou Huang
|
39956efb3f
|
[Bugfix] Fix bad words for Mistral models (#17753)
Signed-off-by: Qiong Zhou Huang <qiong@phonic.co>
|
2025-05-07 23:32:10 -07:00 |
|
Ximingwang-09
|
597051e56f
|
[Qwen3]add qwen3-235b-bf16 fused moe config on A100 (#17715)
|
2025-05-07 23:09:32 -07:00 |
|
Cyrus Leung
|
96722aa81d
|
[Frontend] Chat template fallbacks for multimodal models (#17805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-07 23:05:54 -07:00 |
|
Agata Dobrzyniewicz
|
843b222723
|
[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-05-07 22:37:03 -07:00 |
|
Akash kaothalkar
|
e515668edf
|
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-05-07 22:35:03 -07:00 |
|
Hashem Hashemi
|
5a499e70d5
|
[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-05-07 22:34:49 -07:00 |
|
Russell Bryant
|
6930a41116
|
[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-08 13:34:02 +08:00 |
|
Harry Mellor
|
998eea4a0e
|
Only log non-default CLI args for online serving (#17803)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 22:33:29 -07:00 |
|
Mikhail Podvitskii
|
c747d84576
|
[Installation] OpenTelemetry version update (#17771)
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
|
2025-05-07 22:32:49 -07:00 |
|
Vadim Markovtsev
|
b2da14a05a
|
Improve exception reporting in MP engine (#17800)
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>
|
2025-05-08 05:32:39 +00:00 |
|
Chanh Nguyen
|
7ea2adb802
|
[Core] Support full cuda graph in v1 (#16072)
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-05-07 22:30:15 -07:00 |
|
Nick Hill
|
3d13ca0e24
|
[BugFix] Fix --disable-log-stats in V1 server mode (#17600)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-08 04:08:15 +00:00 |
|
Harry Mellor
|
66ab3b13c9
|
Don't call the venv vllm (#17810)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-08 04:06:39 +00:00 |
|