kourosh hakhamaneshi
|
f148c44c6a
|
[frontend] Refactor CLI Args for a better modular integration (#20206)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-15 02:23:42 -07:00 |
|
Ricardo Decal
|
235bfd5dfe
|
[Docs] Improve documentation for RLHF example (#20598)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 01:54:10 -07:00 |
|
Reid
|
68d28e37b0
|
[frontend] Add --help=page option for paginated help output (#20961)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-15 00:42:00 -07:00 |
|
Ilya Markov
|
37a7d5d74a
|
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-15 06:57:40 +00:00 |
|
Woosuk Kwon
|
d4d309409f
|
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-14 23:01:46 -07:00 |
|
Jennifer He
|
85bd6599e4
|
[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)
Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
|
2025-07-15 13:34:24 +08:00 |
|
Boyuan Feng
|
91b3d190ae
|
[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-15 13:02:17 +08:00 |
|
Isotr0py
|
fc017915f5
|
[Doc] Clearer mistral3 and pixtral model support description (#20926)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 21:56:53 -07:00 |
|
Pavani Majety
|
9ad0a4588b
|
[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-07-15 03:27:50 +00:00 |
|
Ruheena Suhani Shaik
|
016b8d1b7f
|
Enabled BnB NF4 inference on Gaudi (#20172)
Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
|
2025-07-14 20:26:08 -07:00 |
|
Nicolò Lucchesi
|
80305c1b24
|
[CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:15:15 -07:00 |
|
Reid
|
37e2ecace2
|
feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 20:14:23 -07:00 |
|
Ricardo Decal
|
054c8657e3
|
[Docs] Add Kuberay to deployment integrations (#20592)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-14 20:13:55 -07:00 |
|
XiongfeiWei
|
d4170fad39
|
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-15 03:06:33 +00:00 |
|
Michael Goin
|
946aadb4a0
|
[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 02:44:18 +00:00 |
|
Michael Goin
|
bcdfb2a330
|
[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 01:42:17 +00:00 |
|
Richard Zou
|
ba8c300018
|
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-15 01:26:18 +00:00 |
|
Alexander Matveev
|
8cdc371217
|
SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-07-15 01:06:38 +00:00 |
|
Yong Hoon Shin
|
61e20828da
|
Fall back if flashinfer comm module not found (#20936)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-14 23:11:18 +00:00 |
|
Kuntai Du
|
55e1c66da5
|
[Docs] remove outdated performance benchmark (#20935)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-07-14 22:14:17 +00:00 |
|
Thomas Parnell
|
86f3ac21ce
|
Fix overflow indexing in causal_conv1d kernel (#20938)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-14 21:43:07 +00:00 |
|
Nicolò Lucchesi
|
149f2435a5
|
[Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:08:36 +00:00 |
|
Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
Michael Goin
|
8bb43b9c9e
|
Add benchmark dataset for mlperf llama tasks (#20338)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-14 19:10:07 +00:00 |
|
Tyler Michael Smith
|
559756214b
|
Change default model to Qwen3-0.6B (#20335)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-14 16:54:52 +00:00 |
|
Isotr0py
|
6d0cf239c6
|
[CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 16:33:17 +00:00 |
|
Isotr0py
|
3fc964433a
|
[Misc] Clean up Aimv2 config registration in Ovis config (#20921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 15:36:43 +00:00 |
|
Lu Fang
|
0caf61c08a
|
[CI] Update codeowner for compilation code (#20929)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-14 08:33:19 -07:00 |
|
Richard Zou
|
667624659b
|
[CI] cc folks on changes to vllm/compilation (#20925)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-14 07:52:17 -07:00 |
|
ant-yy
|
38efa28278
|
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
|
2025-07-14 22:10:32 +08:00 |
|
Cyrus Leung
|
e8cc53af5e
|
[Misc] Log the reason for falling back to FlexAttention (#20699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 04:16:51 -07:00 |
|
Chauncey
|
a4851cfe68
|
[Bugfix]: Fix messy code when using logprobs (#20910)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-14 11:06:45 +00:00 |
|
Reid
|
9887e8ec50
|
[Misc] Remove unused function (#20909)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 10:48:55 +00:00 |
|
22quinn
|
f326ab9c88
|
[Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-14 10:45:03 +00:00 |
|
Cyrus Leung
|
dcf2a5e208
|
[CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-14 10:32:35 +00:00 |
|
wangxiyuan
|
1e9438e0b0
|
[MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-07-14 09:40:00 +00:00 |
|
Aaron Pham
|
697ef765ee
|
[Refactor][V1] Move outlines utils for V1 imports (#20878)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-07-14 00:58:35 -07:00 |
|
Jee Jee Li
|
a99b9f7dee
|
[Quantization] add BNB for MixtralForCausalLM (#20893)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-14 07:34:34 +00:00 |
|
TJian
|
c488b928a7
|
[ROCm] [Bugfix] [Critical]: Fix mamba compilation bug (#20883)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-07-14 15:23:28 +08:00 |
|
Reid
|
2c7fa47161
|
Fix: Add missing EOFError handling in CLI complete command (#20896)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 07:09:57 +00:00 |
|
Daniel song
|
88fc8a97e3
|
Removing redundant python version check (#20888)
Signed-off-by: Dannyso05 <dansong1177@gmail.com>
|
2025-07-14 06:15:05 +00:00 |
|
Maroon Ayoub
|
66f6fbd393
|
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
|
2025-07-14 02:45:31 +00:00 |
|
22quinn
|
8632e831ba
|
[Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-14 00:49:18 +00:00 |
|
nopperl
|
4bbfc36b16
|
[V1] Hybrid allocator without prefix caching (#20661)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-07-13 16:55:14 +00:00 |
|
TJian
|
80d38b8ac8
|
[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs (#20880)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-07-13 15:19:32 +00:00 |
|
Liuchenlong
|
211b6a6113
|
[Bugfix] fix define of RerankDocument (#20877)
Signed-off-by: liuchenlong <liuchenlong@xiaohongshu.com>
Co-authored-by: liuchenlong <liuchenlong@xiaohongshu.com>
|
2025-07-13 14:32:40 +00:00 |
|
Wang Siyuan
|
247102f07f
|
[Bugfix] Fix: add patch_rope_scaling after hf override (#20857)
Signed-off-by: Wang Siyuan <wsy0227@sjtu.edu.cn>
Signed-off-by: Wang Siyuan <sywang0227@gmail.com>
|
2025-07-13 00:13:25 -07:00 |
|
Minkyu Kim
|
bd4c1e6fdb
|
Support for LlamaForSequenceClassification (#20807)
Signed-off-by: thechaos16 <thechaos16@gmail.com>
|
2025-07-13 00:09:34 -07:00 |
|
QiliangCui
|
99b4f080d8
|
Renable google/gemma-3-1b-it accuracy test. (#20866)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-12 21:48:56 -07:00 |
|
Nicolò Lucchesi
|
020f58abcd
|
[Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-12 19:40:11 -07:00 |
|