Benji Beck
e75f342261
Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema ( #22023 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-19 13:48:26 +08:00
Nikhil Suryawanshi
78dba404ad
[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes ( #22725 )
...
Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>
2025-08-19 04:40:37 +00:00
Chengji Yao
e9d6a3db69
[TPU] make ptxla not imported when using tpu_commons ( #23081 )
...
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
2025-08-19 11:46:42 +08:00
Xiao
a4454e9401
chore: disable enable_cpp_symbolic_shape_guards ( #23048 )
...
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
2025-08-18 23:08:05 -04:00
Woosuk Kwon
14006840ea
[V0 Deprecation] Remove V0 FlashInfer attention backend ( #22776 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 19:54:16 -07:00
Robert Shaw
6603288736
[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests ( #22871 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 17:39:01 -07:00
Thomas Parnell
95e3095136
[Misc] Add @tdoublep as a maintainer of hybrid model and Triton-attention related code ( #23122 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-19 08:31:38 +08:00
Woosuk Kwon
c9b38be8aa
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT ( #23041 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 17:20:38 -07:00
Woosuk Kwon
0dd3f4f5ab
[Misc] Minor refactoring for prepare_inputs ( #23116 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 16:58:05 -07:00
Xiang Xu
498259ccce
Install tpu_info==0.4.0 to fix core dump for TPU ( #23135 )
2025-08-18 16:23:33 -07:00
Michael Goin
6d25e3fd6e
Use Blackwell FlashInfer MXFP4 MoE by default if available ( #23008 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-18 15:25:49 -07:00
Breno Baldas Skuk
ac6eb49de3
fix: OpenAI SDK compat (ResponseTextConfig) ( #23126 )
...
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: Breno Baldas Skuk <breno.skuk@hcompany.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-18 15:22:59 -07:00
Michael Goin
bf756321c7
[CI Bugfix] Pin openai<1.100 to unblock CI ( #23118 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-18 12:14:01 -07:00
Raushan Turganbay
0e3bb543f0
[Bugfix] Support compile for Transformers multimodal ( #23095 )
...
Signed-off-by: raushan <raushan@huggingface.co>
2025-08-18 13:35:48 +00:00
杨朱 · Kiki
569aefd134
chore: remove unnecessary patch_padding_side for the chatglm model ( #23090 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-18 12:32:13 +00:00
Cyrus Leung
d3f71f1224
[Refactor] Get prompt updates earlier ( #23097 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-18 12:31:53 +00:00
Ning Xie
5a30bd10d8
[Bugfix] fix IntermediateTensors equal method ( #23027 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-18 02:58:11 -07:00
Cyrus Leung
27e8d1ea3e
[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs ( #23053 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-18 09:52:00 +00:00
Kunshang Ji
5c79b0d648
[XPU][CI]add xpu env vars in CI scripts ( #22946 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-18 09:47:03 +00:00
Kunshang Ji
5f5664b3e4
[XPU] Fix compile size for xpu ( #23069 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-18 00:04:08 -07:00
Roger Wang
89657a557c
[Misc] Fix backward compatibility from #23030 ( #23070 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-17 23:33:29 -07:00
Ning Xie
08d5f7113a
[Misc] refactor function name ( #23029 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-17 22:16:21 -07:00
Andy Lo
b2fd0b81e0
[Bugfix][CI] Machete kernels: deterministic ordering for more cache hits ( #23055 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-08-17 22:10:26 -07:00
double7
9f1c642254
[Bugfix] fix Qwen2.5-Omni processor output mapping ( #23058 )
...
Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com>
Co-authored-by: 杨森 <yangsen.double7@bytedance.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-17 22:09:11 -07:00
Ning Xie
7be3a59d8e
[Misc] enhance static type hint ( #23059 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-17 22:09:08 -07:00
Woosuk Kwon
8ea0c2753a
[Misc] Minor code cleanup for _get_prompt_logprobs_dict ( #23064 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 18:16:03 -07:00
Simon Mo
0fc8fa751a
fix: gptq marlin weight loading failure ( #23066 )
v0.10.1rc1
2025-08-17 15:56:07 -07:00
Calvin Chen
21e39436c8
[XPU] fix xpu to set cudagraph batch sizes ( #23044 )
...
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
2025-08-17 21:45:42 +00:00
Woosuk Kwon
6d243efeda
[Misc] Convert use_structured_output property into constant ( #23060 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 12:41:38 -07:00
Woosuk Kwon
c55bc1db26
[Misc] Remove dead return ( #23061 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 10:36:46 -07:00
Lucas Wilkinson
292084e72a
[BugFix] Fix for IMA in FA3 varlen combine ( #22967 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-17 08:52:04 -07:00
Kevinzz
16bff144be
[Misc] fix typo in the multimodal doc ( #23051 )
2025-08-17 01:56:20 -07:00
947132885
fe0411fc6f
[Bugfix] should use stack instead of concat ( #22972 )
...
Signed-off-by: 947132885 <947132885@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-17 08:46:36 +00:00
Jee Jee Li
4d4061b6e7
[Kernel] Add cuda kernel for gpt_oss activation ( #22951 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-17 05:03:24 +00:00
Ning Xie
87f48623a5
[Misc] method name typo fix ( #23042 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-16 21:49:14 -07:00
Cyrus Leung
5c32143b9d
[Refactor] Defer tensor data construction in MultiModalKwargs ( #23030 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-16 21:05:50 -07:00
Michael Goin
94096a47c9
[UX] Separate marlin moe config logic from triton moe ( #23006 )
2025-08-16 22:16:42 -04:00
Jinzhen Lin
a258ad8bcc
[Bugfix] fix qwen3 moe fp8 accuracy issue ( #23031 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
2025-08-16 17:41:23 -07:00
afeldman-nm
bf7f470b22
[V1] Logits processors extensibility ( #19912 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-16 12:59:17 -07:00
Michael Goin
4fc722eca4
[Kernel/Quant] Remove AQLM ( #22943 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-16 19:38:21 +00:00
Michael Goin
3253ae765e
[Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions ( #23028 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 18:33:08 +00:00
Michael Goin
000cceca8c
[Bugfix gpt-oss] Fix float32 convert for flashinfer sink support ( #23016 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 11:16:00 -07:00
Woonggi Min
68373d3126
[Frontend] Added support for HermesToolParser for models without special tokens ( #16890 )
...
Signed-off-by: minpeter <kali2005611@gmail.com>
2025-08-16 17:38:42 +00:00
Maximilien de Bayser
52ce1420e9
Fix handling of max_num_batched_tokens for pooling tasks ( #23004 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-16 17:36:30 +00:00
汪志鹏
829bbd7882
[New Model]mBART model ( #22883 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-08-16 12:16:58 +00:00
Cyrus Leung
4dff91c93d
[Refactor] Allow optional MultiModalKwargsItem in IPC ( #23022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-16 11:30:49 +00:00
Seiji Eicher
de9cb61763
Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput ( #23012 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-16 10:21:20 +00:00
Isotr0py
2dbccce8a6
[CI][Bugfix] Skip Ovis2 generation test because of broken remote code ( #22954 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 09:44:19 +00:00
Chengji Yao
933f45334a
[Core] Make cudagraph check cuda platform only ( #23005 )
...
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-08-16 07:46:00 +00:00
Isotr0py
cc826a202b
[Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs ( #22867 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-16 00:44:50 -07:00