Xerxes
e41bf15cd0
[Chore]: qwen3-moe-type-hints-mistake ( #19860 )
...
Co-authored-by: xinnan.hou <hxn02029096@alibaba-inc.com>
2025-06-19 21:43:07 -07:00
Brayden Zhong
5aa4a015ce
[Benchmark] Fix Value of type "SampleRequest" is not indexable ( #18032 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-06-19 21:28:55 -07:00
Elaine Zhao
b6bad3d186
[CI][Neuron] Fail and exit on first error ( #19622 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-20 12:27:51 +08:00
Isotr0py
ee9a1531aa
[CI/Build][Bugfix] Fix deadlock on v1 engine test CI ( #19872 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-20 09:51:07 +08:00
Robert Shaw
10d82f9ac5
[Benchmark][Bugfix] Fix Dataset Length Calculation ( #19868 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-06-19 18:30:41 -07:00
xzbdmw
ea10dd9d9e
[Frontend] early return chat format resolution when specified ( #19735 )
2025-06-19 18:49:59 +00:00
Alex Brooks
ead2110297
[Core][Bugfix] Fix Online MM Beam Search ( #19688 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-06-19 17:18:07 +00:00
Li, Jiang
01220ce89a
[CI][CPU] Improve dummy Triton interfaces and fix the CPU CI ( #19838 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-19 15:46:09 +00:00
22quinn
6f68c49220
[Doc] Update V1 user guide for embedding models ( #19842 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-19 09:43:27 +00:00
Alexei-V-Ivanov-AMD
4719460644
Fixing Chunked Prefill Test. ( #19762 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-06-19 01:36:16 -07:00
NekoMimiUnagi
466166dcfd
[Frontend] Add optional token-level progress bar to LLM.beam_search ( #19301 )
...
Signed-off-by: Ruosen Li <rxl190028@utdallas.edu>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-19 03:21:41 -04:00
Zuxin
1d0ae26c85
Add xLAM tool parser support ( #17148 )
2025-06-19 14:26:41 +08:00
Isotr0py
6021999573
[Minor] Allow redirecting model path for HfRunner in test ( #19795 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-18 23:04:10 -07:00
Ning Xie
c7b370c603
raise exception for pin_lora ( #19809 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-18 22:57:35 -07:00
zsolt-borbely-htec
aa20d10a91
[Misc] [ROCm] Prevent surplus tensor reshape ( #19803 )
...
Signed-off-by: Zsolt Borbely <zsolt.borbely@htecgroup.com>
2025-06-19 13:57:16 +08:00
TJian
2de12be428
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 ( #18990 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-06-18 22:56:31 -07:00
Yu-Hang "Maxin" Tang
83ca9ae47b
Mark invariant normalizer in Gemma as non-persistent ( #19788 )
...
Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
2025-06-18 22:56:03 -07:00
kourosh hakhamaneshi
e2148dc5ea
[Bugfix] Add check_health to v1 async client. ( #19821 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-06-18 21:47:01 -07:00
Lu Fang
b1098b4072
[Bugfix] Fix the linter ( #19826 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 21:44:41 -07:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 ( #16188 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Jee Jee Li
4959915089
[Quantization] Modify the logic of BNB double quantization ( #19742 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-19 03:52:09 +00:00
Lu Fang
8d1e89d946
[Misc][ROCm] Enforce no unused variable in ROCm C++ files ( #19796 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 20:25:15 -07:00
Michael Goin
36239f79dd
Fix FA2 fallback for Blackwell V1 ( #19781 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-19 09:53:55 +08:00
afeldman-nm
dfada85eee
[Frontend] Expose custom args in OpenAI APIs ( #16862 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-18 17:41:11 -07:00
Richard Zou
ed33349738
[BugFix] Fix use_cudagraph=False ( #19612 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-19 08:23:12 +08:00
Woosuk Kwon
d49adea1f9
[Multimodal] Use fast processor for Qwen2/2.5-VL ( #19789 )
2025-06-18 15:49:40 -07:00
Russell Bryant
14fdd21d39
[Core] More fixes to MultiModalEmbeddings type handling ( #19715 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 22:48:29 +00:00
QiliangCui
04fefe7c9a
[TPU] Update torch-xla version to include paged attention tuned block change ( #19813 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-06-18 22:41:13 +00:00
Lukas Geiger
3b523e38d9
[Core] Do not copy array during hashing ( #19484 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-18 15:36:55 -07:00
afeldman-nm
16c16301c8
Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way ( #19783 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 15:08:00 -07:00
Nathan Weinberg
9206d0ff01
docs: fix Slack bulletpoint in README ( #19811 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-06-18 20:47:08 +00:00
Chen Zhang
a89209b78d
[v1] Support mamba2 ( #19327 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-18 20:34:15 +00:00
Russell Bryant
ffacb222cb
[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc ( #19808 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 20:22:28 +00:00
Chauncey
12575cfa7a
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully ( #19725 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-06-18 10:26:16 -07:00
Zzz9990
8b6e1d639c
[Hardware][AMD] integrate aiter chunked prefill into vllm ( #18596 )
...
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
2025-06-18 08:46:51 -07:00
Lu Fang
735a9de71f
[Qwen] Add tagging rule for Qwen related PRs ( #19799 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 14:26:43 +00:00
wangxiyuan
257ab95439
[Platform] Allow platform use V1 Engine by default ( #19792 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-18 13:03:36 +00:00
Reid
cca91a7a10
[doc] fix the incorrect label ( #19787 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-18 10:30:58 +00:00
Woosuk Kwon
f04d604567
[Minor] Zero-initialize attn output buffer ( #19784 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-18 06:59:27 +00:00
afeldman-nm
19a53b2783
[V1] Decouple GPU and TPU InputBatch ( #19778 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 06:38:13 +00:00
Zhonghua Deng
eccdc8318c
[V1][P/D] An native implementation of xPyD based on P2P NCCL ( #18242 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-18 06:32:36 +00:00
Russell Bryant
5f52a84685
[V1] Add API docs for EncoderCacheManager ( #19294 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 13:37:01 +08:00
lkchen
d4629dc43f
[Misc] Add __str__ for RequestStatus ( #19780 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-06-18 03:03:01 +00:00
Ning Xie
6e9cc73f67
[MISC] correct DeviceConfig device field static type analysis ( #19699 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:50 -07:00
Ning Xie
c53711bd63
[MISC] correct copy_blocks src_to_dists param type ( #19696 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:06 -07:00
Chenyaaang
dac8cc49f4
[TPU] Update torch version to include paged attention kernel change ( #19706 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-17 22:24:49 +00:00
Charlie Fu
a44b1c951d
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend ( #19158 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-06-17 17:03:06 -04:00
Michael Goin
b447624ee3
[Bugfix] Fix faulty triton importing logic when using Ray for DP ( #19734 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 20:59:29 +00:00
Jiayi Yao
cda92307c1
[Misc] Update lmcache connector with the latest connector apis ( #19441 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-06-17 19:57:54 +00:00
Michael Goin
bf57ccc5c2
Remove sm120 arch from sm100 cutlass kernel arch list ( #19716 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:39 -07:00