Ricardo Decal
|
3ed94f9d0a
|
[Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 19:46:56 -07:00 |
|
Reid
|
fa839565f2
|
[Misc] Refactor: Improve argument handling for conda command (#20481)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-15 19:43:19 -07:00 |
|
Brayden Zhong
|
75a99b98bf
|
[Chore] Remove outdated transformers check (#20989)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-15 19:42:40 -07:00 |
|
Chauncey
|
b5c3b68359
|
[Misc] bump xgrammar version to v0.1.21 (#20992)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-15 19:42:16 -07:00 |
|
Thomas Parnell
|
6cbc4d4bea
|
[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 19:19:10 -07:00 |
|
Michael Goin
|
153c6f1e61
|
[Frontend] Remove print left in FrontendArgs.add_cli_args (#21004)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 19:18:41 -07:00 |
|
Chauncey
|
34cda778a0
|
[Frontend] OpenAI Responses API supports input image (#20975)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-15 18:59:36 -06:00 |
|
Elfie Guo
|
30800b01c2
|
[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill (#20411)
Signed-off-by: Elfie Guo <elfieg@nvidia.com>
Co-authored-by: Elfie Guo <eflieg@nvidia.com>
|
2025-07-15 17:56:45 -07:00 |
|
Chen LI
|
10be209493
|
[Bug Fix] get_distributed_init_method should get the ip from get_ip i… (#20889)
Signed-off-by: Chen Li <lcpingping@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-07-15 21:23:52 +00:00 |
|
Marko Rosenmueller
|
19c863068b
|
[Frontend] Support cache_salt in /v1/completions and /v1/responses (#20981)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-07-15 21:01:04 +00:00 |
|
Tuan, Hoang-Trong
|
f29fd8a7f8
|
[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
|
2025-07-15 16:08:26 -04:00 |
|
Gregory Shtrasberg
|
ed10f3cea1
|
[ROCm] warpSize is being made non constexpr in ROCm 7.0 (#20330)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-15 14:01:44 -04:00 |
|
Harry Mellor
|
b637e9dcb8
|
Add full serve CLI reference back to docs (#20978)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 17:42:30 +00:00 |
|
Harry Mellor
|
1e36c8687e
|
[Deprecation] Remove nullable_kvs (#20969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 17:21:50 +00:00 |
|
Harry Mellor
|
5bac61362b
|
Configure Gemini (#20971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 09:37:05 -07:00 |
|
Harry Mellor
|
313ae8c16a
|
[Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 15:57:53 +00:00 |
|
Cyrus Leung
|
c847e34b39
|
[CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-15 08:53:16 -07:00 |
|
Patrick von Platen
|
e7e3e6d263
|
Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-15 07:35:30 -07:00 |
|
Christian Pinto
|
4ffd963fa0
|
[v1][core] Support for attention free models (#20811)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-15 14:20:01 +00:00 |
|
Harry Mellor
|
56fe4bedd6
|
[Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 14:00:50 +00:00 |
|
Rui Qiao
|
d91278181d
|
[doc] Add more details for Ray-based DP (#20948)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-15 05:37:12 -07:00 |
|
Li Wang
|
20149d84d9
|
[MISC] Add init files for python package (#20908)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-07-15 12:16:33 +00:00 |
|
Thomas Parnell
|
3534c39a20
|
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 04:04:35 -07:00 |
|
Yifei Teng
|
c586b55667
|
[TPU] Optimize kv cache update kernel (#20415)
Signed-off-by: Yifei Teng <tengyifei88@gmail.com>
|
2025-07-15 03:56:43 -07:00 |
|
Ricardo Decal
|
33d560001e
|
[Docs] Improve documentation for ray cluster launcher helper script (#20602)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 03:55:45 -07:00 |
|
kourosh hakhamaneshi
|
f148c44c6a
|
[frontend] Refactor CLI Args for a better modular integration (#20206)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-15 02:23:42 -07:00 |
|
Ricardo Decal
|
235bfd5dfe
|
[Docs] Improve documentation for RLHF example (#20598)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 01:54:10 -07:00 |
|
Reid
|
68d28e37b0
|
[frontend] Add --help=page option for paginated help output (#20961)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-15 00:42:00 -07:00 |
|
Ilya Markov
|
37a7d5d74a
|
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-15 06:57:40 +00:00 |
|
Woosuk Kwon
|
d4d309409f
|
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-14 23:01:46 -07:00 |
|
Jennifer He
|
85bd6599e4
|
[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)
Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
|
2025-07-15 13:34:24 +08:00 |
|
Boyuan Feng
|
91b3d190ae
|
[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-15 13:02:17 +08:00 |
|
Isotr0py
|
fc017915f5
|
[Doc] Clearer mistral3 and pixtral model support description (#20926)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 21:56:53 -07:00 |
|
Pavani Majety
|
9ad0a4588b
|
[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-07-15 03:27:50 +00:00 |
|
Ruheena Suhani Shaik
|
016b8d1b7f
|
Enabled BnB NF4 inference on Gaudi (#20172)
Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
|
2025-07-14 20:26:08 -07:00 |
|
Nicolò Lucchesi
|
80305c1b24
|
[CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:15:15 -07:00 |
|
Reid
|
37e2ecace2
|
feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-14 20:14:23 -07:00 |
|
Ricardo Decal
|
054c8657e3
|
[Docs] Add Kuberay to deployment integrations (#20592)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-14 20:13:55 -07:00 |
|
XiongfeiWei
|
d4170fad39
|
Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-15 03:06:33 +00:00 |
|
Michael Goin
|
946aadb4a0
|
[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 02:44:18 +00:00 |
|
Michael Goin
|
bcdfb2a330
|
[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 01:42:17 +00:00 |
|
Richard Zou
|
ba8c300018
|
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-15 01:26:18 +00:00 |
|
Alexander Matveev
|
8cdc371217
|
SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-07-15 01:06:38 +00:00 |
|
Yong Hoon Shin
|
61e20828da
|
Fall back if flashinfer comm module not found (#20936)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-14 23:11:18 +00:00 |
|
Kuntai Du
|
55e1c66da5
|
[Docs] remove outdated performance benchmark (#20935)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-07-14 22:14:17 +00:00 |
|
Thomas Parnell
|
86f3ac21ce
|
Fix overflow indexing in causal_conv1d kernel (#20938)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-14 21:43:07 +00:00 |
|
Nicolò Lucchesi
|
149f2435a5
|
[Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-14 20:08:36 +00:00 |
|
Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
Michael Goin
|
8bb43b9c9e
|
Add benchmark dataset for mlperf llama tasks (#20338)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-14 19:10:07 +00:00 |
|
Tyler Michael Smith
|
559756214b
|
Change default model to Qwen3-0.6B (#20335)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-14 16:54:52 +00:00 |
|