Isotr0py
|
cc826a202b
|
[Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs (#22867)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-16 00:44:50 -07:00 |
|
Jee Jee Li
|
6d3da472bc
|
[Misc] Add --save-dir option to benchmark_moe (#23020)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-16 07:26:10 +00:00 |
|
Andrew Sansom
|
78863f8c5c
|
[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors (#22962)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-08-16 06:25:10 +00:00 |
|
Lucas Wilkinson
|
5157827cfc
|
[Build] Env var to disable sccache (#22968)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-16 05:36:27 +00:00 |
|
Kunshang Ji
|
7caec10e7b
|
[XPU]avoid circular import during XPU init (#23017)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-16 05:16:34 +00:00 |
|
Grace Ho
|
1f83e7d849
|
[misc] nsys profile output kernel classifier and visualizer (#22971)
Signed-off-by: Grace Ho <grho@nvidia.com>
|
2025-08-16 02:52:51 +00:00 |
|
Calvin Chen
|
e4e37ded56
|
[V1] support min_tokens for detokener (#22014)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-08-16 02:28:10 +00:00 |
|
Nick Hill
|
f6b5040590
|
[Frontend] Avoid list copies in serving_chat.py (#22947)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-16 02:06:30 +00:00 |
|
Benjamin Chislett
|
fbd88728b3
|
[Bugfix] Fix DeepSeek MTP (#22934)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-08-16 01:25:06 +00:00 |
|
Nicolò Lucchesi
|
070da660c1
|
[Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit (#22735)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-16 00:14:08 +00:00 |
|
Nick Hill
|
ad0297d113
|
[Misc] Support passing multiple request ids at once to AsyncLLM.abort() (#22944)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-15 17:00:36 -07:00 |
|
Yichen Yan
|
236b864e4f
|
[BugFix] Make run_once thread-safe (#22978)
Signed-off-by: <wenji.yyc@alibaba-inc.com>
Signed-off-by: Yichen Yan <wenji.yyc@alibaba-inc.com>
|
2025-08-15 16:56:17 -07:00 |
|
Yong Hoon Shin
|
3e2f7985a2
|
Support multiple attention groups for KV sharing (#22672)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-15 16:54:10 -07:00 |
|
Or Ozeri
|
c280066f9d
|
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-08-15 16:52:52 -07:00 |
|
Nick Hill
|
b9dc9d2607
|
[BugFix] Handle case where async utility call is cancelled (#22996)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
|
2025-08-15 17:38:42 -06:00 |
|
rishitdholakia13
|
1fc375dc05
|
[Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits (#22963)
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
|
2025-08-15 23:25:05 +00:00 |
|
Eli Uriegas
|
76144adf76
|
ci: Add CUDA + arm64 release builds (#21201)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
|
2025-08-15 23:16:23 +00:00 |
|
Thomas Parnell
|
f5d412bafb
|
[BugFix] Fix regression caused by mamba state dtype PR (#22998)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-15 22:55:26 +00:00 |
|
Lucas Wilkinson
|
177e55e3bd
|
[Attention] FA3 Attention Sinks Perf Boost (#22478)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-15 17:41:07 -04:00 |
|
eigen
|
1723ef1aae
|
minor: zero workspace buffer init for flashinfer trtllm-gen attn (#22603)
|
2025-08-15 21:38:10 +00:00 |
|
Seiji Eicher
|
00d6cba0cf
|
Add PrefixRepetitionRandomDataset to vllm bench serve datasets (#20638)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-08-15 14:09:23 -07:00 |
|
shixianc
|
7f89ed248f
|
[Fix] enable swap_ab for pplx problem size computation (#22991)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
|
2025-08-15 14:02:12 -07:00 |
|
Michael Goin
|
8a87cd27d9
|
[CI] Speed up Whisper tests by reusing server (#22859)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-15 16:56:31 -04:00 |
|
Michael Goin
|
a344a1a7da
|
Use regex in convert-results-json-to-markdown.py (#22989)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 20:54:20 +00:00 |
|
nvjullin
|
79899b63f6
|
[Bugfix] Added more env vars to hash (#22449)
Signed-off-by: Julien Lin <jullin@nvidia.com>
|
2025-08-15 20:08:37 +00:00 |
|
Zebing Lin
|
6e670778cd
|
[Core] direct indexing on self.block_table_np in compute_slot_mapping (#22940)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-15 12:12:12 -07:00 |
|
Wentao Ye
|
df5afa82e5
|
[Log] Debug Once for Randomizing dummy data for DP Rank (#22860)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-15 11:51:50 -07:00 |
|
Chih-Chieh Yang
|
6cd69f51bf
|
[Model] Granite-4 support loading quantized checkpoint (#22925)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-08-15 18:47:56 +00:00 |
|
bnellnm
|
8ad7285ea2
|
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 14:46:00 -04:00 |
|
Shanshan Shen
|
48b01fd4d4
|
[Structured Output] Make the output of structured output example more complete (#22481)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-08-15 18:29:25 +00:00 |
|
Chenheli Hua
|
993d3d122b
|
[Benchmarks] Include image data when ShareGPT4V dataset is used. (#22955)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-15 18:23:06 +00:00 |
|
JartX
|
68af77e51c
|
[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (#22896)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-08-15 17:42:49 +00:00 |
|
sstamenk
|
6b04039a72
|
[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 (#22369)
Signed-off-by: sstamenk <sstamenk@amd.com>
|
2025-08-15 17:17:31 +00:00 |
|
Woosuk Kwon
|
1c859a1387
|
[V0 Deprecation] Remove advance_step (#22969)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-15 08:22:31 -07:00 |
|
fhl2000
|
74f441f4b5
|
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-08-15 10:01:39 -04:00 |
|
Csrayz
|
a0632a3e03
|
[Frontend] Expose do_log_stats interval to env (#22905)
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 13:00:20 +00:00 |
|
Harry Mellor
|
e8b40c7fa2
|
[CI] Remove duplicated docs build from buildkite (#22924)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 05:58:06 -07:00 |
|
Jee Jee Li
|
48f4636927
|
[Misc] Ignore ep_kernels_workspace (#22807)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-15 05:58:03 -07:00 |
|
Thomas Parnell
|
75531a6c13
|
[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-15 12:57:06 +00:00 |
|
Staszek Paśko
|
22341b996e
|
Improve multimodal hasher performance for re-used Image prompts (#22825)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
|
2025-08-15 12:32:56 +00:00 |
|
Roger Wang
|
49252cf59e
|
[MM] Allow skipping memory profiling for multimodal models. (#22950)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-15 11:41:38 +00:00 |
|
Jinzhen Lin
|
3e6dd40016
|
[Bugfix] fix cuda 12.6 and 11.8 build (#22952)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-08-15 10:10:22 +00:00 |
|
Sayandip Dutta
|
aa300c438d
|
[Bugfix] Unquote file uri before reading image (#22912)
Signed-off-by: Sayandip Dutta <sayandip199309@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-15 09:28:00 +00:00 |
|
amirai21
|
fe91ce9591
|
[V1] - Split Prefill and Decode for Mamba1 models (#22653)
Signed-off-by: amirk <amirk@ai21.com>
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com>
|
2025-08-15 08:59:52 +00:00 |
|
wang.yuqi
|
5406ebf5c9
|
[CI] Pooling models mteb test uses enforce_eager (#22878)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-15 01:16:15 -07:00 |
|
frankie
|
b2c06509e5
|
[P/D]Provide bucket algorithm rate limiter for proxy_server (#22643)
Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>
Signed-off-by: frankie <wangyongsheng686@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-08-15 07:01:48 +00:00 |
|
TJian
|
b2f6c247a9
|
Revert "[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module." (#22956)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-08-15 06:39:19 +00:00 |
|
Asaf Joseph Gardin
|
3d232dbd19
|
[Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-15 06:38:05 +00:00 |
|
Wentao Ye
|
5c3fbfe46b
|
[Feature] Full Cuda Graph Support for Cutlass MLA and 6% E2E Throughput Improvement (#22763)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-15 06:27:30 +00:00 |
|
amirkl94
|
b4cef5e6c7
|
refactor: Change scaling factors calculation for flashinfer FusedMoE (#22812)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 06:19:31 +00:00 |
|