Harry Mellor
|
1e36c8687e
|
[Deprecation] Remove nullable_kvs (#20969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 17:21:50 +00:00 |
|
Harry Mellor
|
5bac61362b
|
Configure Gemini (#20971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 09:37:05 -07:00 |
|
Harry Mellor
|
313ae8c16a
|
[Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 15:57:53 +00:00 |
|
Cyrus Leung
|
c847e34b39
|
[CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-15 08:53:16 -07:00 |
|
Patrick von Platen
|
e7e3e6d263
|
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-15 07:35:30 -07:00 |
|
Christian Pinto
|
4ffd963fa0
|
[v1][core] Support for attention free models (#20811)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-15 14:20:01 +00:00 |
|
Harry Mellor
|
56fe4bedd6
|
[Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 14:00:50 +00:00 |
|
Rui Qiao
|
d91278181d
|
[doc] Add more details for Ray-based DP (#20948)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-15 05:37:12 -07:00 |
|
Li Wang
|
20149d84d9
|
[MISC] Add init files for python package (#20908)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-07-15 12:16:33 +00:00 |
|
Thomas Parnell
|
3534c39a20
|
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 04:04:35 -07:00 |
|
Yifei Teng
|
c586b55667
|
[TPU] Optimize kv cache update kernel (#20415)
Signed-off-by: Yifei Teng <tengyifei88@gmail.com>
|
2025-07-15 03:56:43 -07:00 |
|
Ricardo Decal
|
33d560001e
|
[Docs] Improve documentation for ray cluster launcher helper script (#20602)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 03:55:45 -07:00 |
|
kourosh hakhamaneshi
|
f148c44c6a
|
[frontend] Refactor CLI Args for a better modular integration (#20206)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-15 02:23:42 -07:00 |
|
Ricardo Decal
|
235bfd5dfe
|
[Docs] Improve documentation for RLHF example (#20598)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 01:54:10 -07:00 |
|
Reid
|
68d28e37b0
|
[frontend] Add --help=page option for paginated help output (#20961)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-15 00:42:00 -07:00 |
|
Ilya Markov
|
37a7d5d74a
|
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-15 06:57:40 +00:00 |
|
Woosuk Kwon
|
d4d309409f
|
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-14 23:01:46 -07:00 |
|
Jennifer He
|
85bd6599e4
|
[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)
Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
|
2025-07-15 13:34:24 +08:00 |
|
Boyuan Feng
|
91b3d190ae
|
[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-15 13:02:17 +08:00 |
|
Isotr0py
|
fc017915f5
|
[Doc] Clearer mistral3 and pixtral model support description (#20926)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 21:56:53 -07:00 |
|
Pavani Majety
|
9ad0a4588b
|
[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-07-15 03:27:50 +00:00 |
|
Ruheena Suhani Shaik
|
016b8d1b7f
|
Enabled BnB NF4 inference on Gaudi (#20172)
Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
|
2025-07-14 20:26:08 -07:00 |
|
Nicolò Lucchesi
|
80305c1b24
|
[CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:15:15 -07:00 |
|
Reid
|
37e2ecace2
|
feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 20:14:23 -07:00 |
|
Ricardo Decal
|
054c8657e3
|
[Docs] Add Kuberay to deployment integrations (#20592)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-14 20:13:55 -07:00 |
|
XiongfeiWei
|
d4170fad39
|
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-15 03:06:33 +00:00 |
|
Michael Goin
|
946aadb4a0
|
[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 02:44:18 +00:00 |
|
Michael Goin
|
bcdfb2a330
|
[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 01:42:17 +00:00 |
|
Richard Zou
|
ba8c300018
|
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-15 01:26:18 +00:00 |
|
Alexander Matveev
|
8cdc371217
|
SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-07-15 01:06:38 +00:00 |
|
Yong Hoon Shin
|
61e20828da
|
Fall back if flashinfer comm module not found (#20936)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-14 23:11:18 +00:00 |
|
Kuntai Du
|
55e1c66da5
|
[Docs] remove outdated performance benchmark (#20935)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-07-14 22:14:17 +00:00 |
|
Thomas Parnell
|
86f3ac21ce
|
Fix overflow indexing in causal_conv1d kernel (#20938)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-14 21:43:07 +00:00 |
|
Nicolò Lucchesi
|
149f2435a5
|
[Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:08:36 +00:00 |
|
Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
Michael Goin
|
8bb43b9c9e
|
Add benchmark dataset for mlperf llama tasks (#20338)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-14 19:10:07 +00:00 |
|
Tyler Michael Smith
|
559756214b
|
Change default model to Qwen3-0.6B (#20335)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-14 16:54:52 +00:00 |
|
Isotr0py
|
6d0cf239c6
|
[CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 16:33:17 +00:00 |
|
Isotr0py
|
3fc964433a
|
[Misc] Clean up Aimv2 config registration in Ovis config (#20921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 15:36:43 +00:00 |
|
Lu Fang
|
0caf61c08a
|
[CI] Update codeowner for compilation code (#20929)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-14 08:33:19 -07:00 |
|
Richard Zou
|
667624659b
|
[CI] cc folks on changes to vllm/compilation (#20925)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-14 07:52:17 -07:00 |
|
ant-yy
|
38efa28278
|
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
|
2025-07-14 22:10:32 +08:00 |
|
Cyrus Leung
|
e8cc53af5e
|
[Misc] Log the reason for falling back to FlexAttention (#20699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 04:16:51 -07:00 |
|
Chauncey
|
a4851cfe68
|
[Bugfix]: Fix messy code when using logprobs (#20910)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-14 11:06:45 +00:00 |
|
Reid
|
9887e8ec50
|
[Misc] Remove unused function (#20909)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 10:48:55 +00:00 |
|
22quinn
|
f326ab9c88
|
[Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-14 10:45:03 +00:00 |
|
Cyrus Leung
|
dcf2a5e208
|
[CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 10:32:35 +00:00 |
|
wangxiyuan
|
1e9438e0b0
|
[MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-07-14 09:40:00 +00:00 |
|
Aaron Pham
|
697ef765ee
|
[Refactor][V1] Move outlines utils for V1 imports (#20878)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-07-14 00:58:35 -07:00 |
|
Jee Jee Li
|
a99b9f7dee
|
[Quantization] add BNB for MixtralForCausalLM (#20893)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-14 07:34:34 +00:00 |
|