Jinzhen Lin
a258ad8bcc
[Bugfix] fix qwen3 moe fp8 accuracy issue ( #23031 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
2025-08-16 17:41:23 -07:00
afeldman-nm
bf7f470b22
[V1] Logits processors extensibility ( #19912 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-16 12:59:17 -07:00
Michael Goin
4fc722eca4
[Kernel/Quant] Remove AQLM ( #22943 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-16 19:38:21 +00:00
Michael Goin
3253ae765e
[Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions ( #23028 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 18:33:08 +00:00
Michael Goin
000cceca8c
[Bugfix gpt-oss] Fix float32 convert for flashinfer sink support ( #23016 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 11:16:00 -07:00
Woonggi Min
68373d3126
[Frontend] Added support for HermesToolParser for models without special tokens ( #16890 )
...
Signed-off-by: minpeter <kali2005611@gmail.com>
2025-08-16 17:38:42 +00:00
Maximilien de Bayser
52ce1420e9
Fix handling of max_num_batched_tokens for pooling tasks ( #23004 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-16 17:36:30 +00:00
汪志鹏
829bbd7882
[New Model]mBART model ( #22883 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-08-16 12:16:58 +00:00
Cyrus Leung
4dff91c93d
[Refactor] Allow optional MultiModalKwargsItem in IPC ( #23022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-16 11:30:49 +00:00
Seiji Eicher
de9cb61763
Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput ( #23012 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-16 10:21:20 +00:00
Isotr0py
2dbccce8a6
[CI][Bugfix] Skip Ovis2 generation test because of broken remote code ( #22954 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 09:44:19 +00:00
Chengji Yao
933f45334a
[Core] Make cudagraph check cuda platform only ( #23005 )
...
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-08-16 07:46:00 +00:00
Isotr0py
cc826a202b
[Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs ( #22867 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 00:44:50 -07:00
Jee Jee Li
6d3da472bc
[Misc] Add --save-dir option to benchmark_moe ( #23020 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-16 07:26:10 +00:00
Andrew Sansom
78863f8c5c
[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors ( #22962 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-08-16 06:25:10 +00:00
Lucas Wilkinson
5157827cfc
[Build] Env var to disable sccache ( #22968 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-16 05:36:27 +00:00
Kunshang Ji
7caec10e7b
[XPU]avoid circular import during XPU init ( #23017 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-16 05:16:34 +00:00
Grace Ho
1f83e7d849
[misc] nsys profile output kernel classifier and visualizer ( #22971 )
...
Signed-off-by: Grace Ho <grho@nvidia.com>
2025-08-16 02:52:51 +00:00
Calvin Chen
e4e37ded56
[V1] support min_tokens for detokener ( #22014 )
...
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-08-16 02:28:10 +00:00
Nick Hill
f6b5040590
[Frontend] Avoid list copies in serving_chat.py ( #22947 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-16 02:06:30 +00:00
Benjamin Chislett
fbd88728b3
[Bugfix] Fix DeepSeek MTP ( #22934 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-08-16 01:25:06 +00:00
Nicolò Lucchesi
070da660c1
[Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit ( #22735 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-16 00:14:08 +00:00
Nick Hill
ad0297d113
[Misc] Support passing multiple request ids at once to AsyncLLM.abort() ( #22944 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-15 17:00:36 -07:00
Yichen Yan
236b864e4f
[BugFix] Make run_once thread-safe ( #22978 )
...
Signed-off-by: <wenji.yyc@alibaba-inc.com>
Signed-off-by: Yichen Yan <wenji.yyc@alibaba-inc.com>
2025-08-15 16:56:17 -07:00
Yong Hoon Shin
3e2f7985a2
Support multiple attention groups for KV sharing ( #22672 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-15 16:54:10 -07:00
Or Ozeri
c280066f9d
[v1] Move block_hashes from KVCacheManager to Request.block_hashes ( #19728 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-08-15 16:52:52 -07:00
Nick Hill
b9dc9d2607
[BugFix] Handle case where async utility call is cancelled ( #22996 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
2025-08-15 17:38:42 -06:00
rishitdholakia13
1fc375dc05
[Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits ( #22963 )
...
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
2025-08-15 23:25:05 +00:00
Eli Uriegas
76144adf76
ci: Add CUDA + arm64 release builds ( #21201 )
...
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
2025-08-15 23:16:23 +00:00
Thomas Parnell
f5d412bafb
[BugFix] Fix regression caused by mamba state dtype PR ( #22998 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-15 22:55:26 +00:00
Lucas Wilkinson
177e55e3bd
[Attention] FA3 Attention Sinks Perf Boost ( #22478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-15 17:41:07 -04:00
eigen
1723ef1aae
minor: zero workspace buffer init for flashinfer trtllm-gen attn ( #22603 )
2025-08-15 21:38:10 +00:00
Seiji Eicher
00d6cba0cf
Add PrefixRepetitionRandomDataset to vllm bench serve datasets ( #20638 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-08-15 14:09:23 -07:00
shixianc
7f89ed248f
[Fix] enable swap_ab for pplx problem size computation ( #22991 )
...
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-08-15 14:02:12 -07:00
Michael Goin
8a87cd27d9
[CI] Speed up Whisper tests by reusing server ( #22859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-15 16:56:31 -04:00
Michael Goin
a344a1a7da
Use regex in convert-results-json-to-markdown.py ( #22989 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 20:54:20 +00:00
nvjullin
79899b63f6
[Bugfix] Added more env vars to hash ( #22449 )
...
Signed-off-by: Julien Lin <jullin@nvidia.com>
2025-08-15 20:08:37 +00:00
Zebing Lin
6e670778cd
[Core] direct indexing on self.block_table_np in compute_slot_mapping ( #22940 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com>
2025-08-15 12:12:12 -07:00
Wentao Ye
df5afa82e5
[Log] Debug Once for Randomizing dummy data for DP Rank ( #22860 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-15 11:51:50 -07:00
Chih-Chieh Yang
6cd69f51bf
[Model] Granite-4 support loading quantized checkpoint ( #22925 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-08-15 18:47:56 +00:00
bnellnm
8ad7285ea2
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. ( #22035 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
Shanshan Shen
48b01fd4d4
[Structured Output] Make the output of structured output example more complete ( #22481 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-08-15 18:29:25 +00:00
Chenheli Hua
993d3d122b
[Benchmarks] Include image data when ShareGPT4V dataset is used. ( #22955 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-08-15 18:23:06 +00:00
JartX
68af77e51c
[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches ( #22896 )
...
Signed-off-by: JartX <sagformas@epdcenter.es>
2025-08-15 17:42:49 +00:00
sstamenk
6b04039a72
[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 ( #22369 )
...
Signed-off-by: sstamenk <sstamenk@amd.com>
2025-08-15 17:17:31 +00:00
Woosuk Kwon
1c859a1387
[V0 Deprecation] Remove advance_step ( #22969 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-15 08:22:31 -07:00
fhl2000
74f441f4b5
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer ( #20059 )
...
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-08-15 10:01:39 -04:00
Csrayz
a0632a3e03
[Frontend] Expose do_log_stats interval to env ( #22905 )
...
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 13:00:20 +00:00
Harry Mellor
e8b40c7fa2
[CI] Remove duplicated docs build from buildkite ( #22924 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 05:58:06 -07:00
Jee Jee Li
48f4636927
[Misc] Ignore ep_kernels_workspace ( #22807 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-15 05:58:03 -07:00