Cyrus Leung
9e6bcda3ac
[mypy] Enable type checking for more directories ( #29674 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 08:39:27 -08:00
Harry Mellor
9eec282cb5
Guard FlashInfer sampler using the same check as FlashInfer attention backend ( #29415 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 08:34:48 -08:00
Cyrus Leung
0808eb813b
[Misc] Remove yapf directives ( #29675 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 15:07:23 +00:00
Mingyuan Ma
460d8bbf2d
Remove upstream fa checks ( #29471 )
...
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 05:52:42 -08:00
Li, Jiang
e2f56c309d
[CPU] Update torch 2.9.1 for CPU backend ( #29664 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-28 13:37:54 +00:00
HappyAmazonian
f8151b66fa
Revert "Supress verbose logs from model_hosting_container_standards (… ( #29335 )
...
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 05:29:05 -08:00
Cyrus Leung
1168768a2d
[Optimization] Early return for _apply_matches and _iter_placeholders ( #29668 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 13:26:47 +00:00
Nick Hill
8e7a891602
[BugFix] Fix spec decoding max_tokens scheduling perf issue ( #29542 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-28 20:52:23 +08:00
Cyrus Leung
953d9c820b
[mypy] Pass type checking for vllm/utils and vllm/v1/pool ( #29666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 20:40:47 +08:00
Cyrus Leung
33b06a6f24
[Misc] Remove redundant attention var constants ( #29650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 04:35:19 -08:00
Wilson Wu
5c2b5cb422
[Docs] Add SPLADE and Ultravox models to supported models documentation ( #29659 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-28 01:29:28 -09:00
杰兮
3cb32e5d6e
[Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disabled ( #28985 )
...
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-11-28 02:08:42 -08:00
Cyrus Leung
ccbdf51bd5
[Doc] Reorganize benchmark docs ( #29658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 17:19:25 +08:00
Filipp Fisin
5f5521bd5d
Fix parameter order in GPT-OSS weight loading function for non-MXFP4 weights ( #29506 )
...
Signed-off-by: Filipp Fisin <48059208+qGentry@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 00:45:10 -08:00
Julien Denize
b2c1d294fa
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token ( #29607 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 16:44:47 +08:00
maang-h
cc0f2a0e19
[Doc] Improve abnormal information string ( #29655 )
...
Signed-off-by: maang <maang_h@163.com>
2025-11-28 00:12:20 -08:00
rongfu.leng
480598958e
[Feature][Bench] Add pareto visualization ( #29477 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-11-27 23:53:20 -08:00
Cyrus Leung
b34e8775a3
Revert "[CPU]Update CPU PyTorch to 2.9.0 ( #29589 )" ( #29647 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 22:43:18 -08:00
wang.yuqi
f4b76056ee
Improve enable chunked_prefill & prefix_caching logic. ( #26623 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-27 22:05:48 -08:00
EanWang211123
37b15e97e8
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl ( #29594 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-27 22:05:45 -08:00
maang-h
c7ba1f6bc7
[BugFix] Fix ValueError in NewRequestData repr methods ( #29392 )
...
Signed-off-by: maang <maang_h@163.com>
2025-11-28 13:42:30 +08:00
Wilson Wu
18523b87f6
[Docs] Update supported models for Olmo 3 in tool calling documentation ( #29411 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
2025-11-28 02:53:55 +00:00
Xin Yang
745a3bae1a
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #28971 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-28 10:48:28 +08:00
scydas
35657bcd7a
[CPU]Update CPU PyTorch to 2.9.0 ( #29589 )
...
Signed-off-by: scyda <scyda@outlook.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-11-28 09:34:33 +08:00
Lucas Wilkinson
be493e0b3c
[BugFix] Fix new nightly failures ( #29578 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-27 13:45:38 -08:00
Woosuk Kwon
ae0ce1be27
[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput ( #29623 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-27 12:38:53 -08:00
Andrii Skliar
a5345bf49d
[BugFix] Fix plan API Mismatch when using latest FlashInfer ( #29426 )
...
Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
2025-11-27 11:34:59 -08:00
Nicolò Lucchesi
e5a621b724
[CI] Add batched audios Whisper test ( #29308 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-27 19:31:52 +00:00
Isotr0py
38658ec6f3
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU ( #29614 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-27 19:17:37 +00:00
Cyrus Leung
a24ea5414b
[Deprecation] Advance deprecation status ( #29617 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 19:04:58 +00:00
Cyrus Leung
ea228b4491
[Misc] Remove unused code from protocol.py ( #29616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 18:39:59 +00:00
果冻虾仁
d45269b378
add skip_reading_prefix_cache in repr for PoolingParams ( #29620 )
2025-11-27 09:21:00 -08:00
Cyrus Leung
ee9841daa9
[Bugfix] Fix doc build on main ( #29619 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 09:08:08 -08:00
Injae Ryou
0840abdd24
[BugFix] Optional tokenizer argument when loading GGUF models ( #29582 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-27 16:53:10 +00:00
Harry Mellor
e1f262337b
Update Transformers pin in CI to 4.57.3 ( #29418 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 08:42:14 -08:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Mathis Felardos
cd007a53b4
[bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Prefill dies ( #28120 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
2025-11-27 15:32:38 +00:00
Didier Durand
66d3d5422c
[Doc]: fixing typos in diverse files ( #29492 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-27 07:15:50 -08:00
Ryan Rock
bab438ff3e
[CI/Build] Skip ray tests on ROCm ( #29556 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-11-27 07:01:37 -08:00
Li, Jiang
882851dc81
[CI/Build][Bugfix] Fix auto label issues for CPU ( #29610 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-27 14:51:26 +00:00
Jee Jee Li
2f5f9acd55
[LoRA] Continue optimizing MoE LoRA weight loading ( #29322 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-27 05:56:28 -08:00
Roger Wang
cf348c8d27
[Bugfix] Fix HunyuanVL XD-RoPE ( #29593 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored by: grider-transwithai <grider@transwith.ai>
2025-11-27 12:36:24 +00:00
Li, Jiang
a5abd1d384
[CI] Auto label CPU related issues ( #29602 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-27 11:33:19 +00:00
Cyrus Leung
e6d4f3c254
[Bugfix] Fix pre-commit ( #29601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 02:23:06 -08:00
maang-h
51906c8c55
[Docs] Improve priority parameter documentation ( #29572 )
...
Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-27 02:09:24 -08:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Cyrus Leung
00d3310d2d
[Bugfix] Update Ultravox compatibility ( #29588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 01:36:18 -08:00
Woosuk Kwon
da3222f371
[Model Runner V2] Implement multi-step Eagle with CUDA graph ( #29559 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-27 00:09:41 -08:00
Micah Williamson
43c5792592
[ROCm][CI] Fix test_cpu_offloading for ROCm ( #29548 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-27 07:54:44 +00:00
Johnny Yang
3ecabd06ee
Fix tpu-inference platform path ( #29554 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 23:25:21 -08:00