Cyrus Leung
|
417a164af6
|
[Misc] Remove unused encoder-decoder error strings (#25374)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-22 11:04:32 +00:00 |
|
Yizhou
|
b6f01bd9a7
|
refactor: abstract graph mode support into platform interface (#25161)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-09-22 10:22:29 +00:00 |
|
Nicolò Lucchesi
|
4cf71cc88a
|
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-22 10:12:57 +00:00 |
|
Nicolò Lucchesi
|
a66d131381
|
[TPU][Bugfix][CI] Fix broken tests/build dependency (#25255)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-22 09:55:04 +00:00 |
|
Eldar Kurtić
|
21467f9a1c
|
Enable Eagle3 speculative decoding for GPT-OSS model (#25246)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-09-22 08:50:39 +00:00 |
|
Cyrus Leung
|
f92d952632
|
[V0 Deprecation] Remove MultiModalPlaceholderMap (#25366)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-22 08:49:19 +00:00 |
|
Cyrus Leung
|
6d0b827cbd
|
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-22 13:58:26 +08:00 |
|
WeiQing Chen
|
0eecb31663
|
[Bugfix] Fix hermes tool parser handling of non-string argument types (#22002)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-09-22 11:35:39 +08:00 |
|
WeiQing Chen
|
793be8d057
|
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-09-22 02:49:13 +00:00 |
|
Roger Wang
|
7b57a433da
|
[Model] Support Dots OCR (#24645)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: yinz-aizip <yinz@aizip.ai>
|
2025-09-22 02:24:40 +00:00 |
|
Deboleina
|
5aeb925452
|
Multimodal - audio tests (#25285)
Signed-off-by: Debolina Roy <debroy@redhat.com>
|
2025-09-22 07:07:11 +08:00 |
|
Yang Liu
|
04d3752329
|
[Bugfix][V0 Deprecation][CI] use async mock and await for async method (#25325)
Signed-off-by: Yang <lymailforjob@gmail.com>
|
2025-09-22 07:06:16 +08:00 |
|
Woosuk Kwon
|
bc6e542d9f
|
Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-21 16:03:28 -07:00 |
|
Isotr0py
|
af7dfb0d1a
|
[Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate (#25347)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-21 20:12:45 +00:00 |
|
Woosuk Kwon
|
1c3ffdbecc
|
[V0 Deprecation] Remove V0 sampling metadata (#25345)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-21 10:37:11 -07:00 |
|
Rahul Tuli
|
c438b2951c
|
feat: Enable engine-level arguments with speculators models (#25250)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-09-21 11:04:45 -06:00 |
|
Woosuk Kwon
|
0ff8ebb2d7
|
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-21 08:52:32 -07:00 |
|
Woosuk Kwon
|
26e673fe93
|
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-21 08:52:15 -07:00 |
|
Cyrus Leung
|
65a5910ce3
|
[Optimization] Cache chat template result when processor fails to be loaded (#25341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-21 19:41:02 +08:00 |
|
Simon Danielsson
|
9aea7373ff
|
[Bugfix] Typos in error message for missing model config file (#25339)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-09-21 04:36:47 -07:00 |
|
Roger Wang
|
30d08911f7
|
[MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate (#25337)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-21 11:05:20 +00:00 |
|
Isotr0py
|
cf56cf78b4
|
[V1] Add sliding window support to Flex Attention backend (#24089)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-21 05:08:07 +00:00 |
|
Woosuk Kwon
|
7ed82d1974
|
[V0 Deprecation] Remove V0 MP executor (#25329)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 21:26:35 -07:00 |
|
Woosuk Kwon
|
12dbd834cf
|
[V0 Deprecation] Remove from_seq_group methods (#25330)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 21:10:48 -07:00 |
|
Wenlong Wang
|
035fd2bd2c
|
[Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25005)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-21 03:55:10 +00:00 |
|
Woosuk Kwon
|
1cd885bd54
|
[V0 Deprecation] Remove V0 model runner base & simplify worker base (#25328)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 20:49:09 -07:00 |
|
Huamin Li
|
62b38dc832
|
[Doc] improve test-pipeline.yaml documentation (#25305)
Signed-off-by: Huamin Li <3ericli@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-09-20 20:29:12 -07:00 |
|
Woosuk Kwon
|
c99db8c8dd
|
[V0 Deprecation] Remove V0 core (#25321)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 19:58:26 -07:00 |
|
Woosuk Kwon
|
72dd1595b4
|
[CI] Skip tests failing on main (#25326)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 19:57:46 -07:00 |
|
Woosuk Kwon
|
572ddf83ce
|
[Chore] Remove unused sampler in models (#25324)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 19:53:20 -07:00 |
|
Woosuk Kwon
|
86647d1cd0
|
[V0 Deprecation] Remove V0 Output Processor (#25320)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:57:20 -07:00 |
|
Woosuk Kwon
|
52c2a8d4ad
|
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:56:30 -07:00 |
|
Michael Yao
|
367a480bd3
|
[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25220)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-20 16:39:47 -07:00 |
|
Cyrus Leung
|
bef180f009
|
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 17:50:58 +00:00 |
|
lirong
|
d88918e4c2
|
[Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308)
Signed-off-by: pengdrumli <pengdrumli@tencent.com>
|
2025-09-20 21:15:22 +08:00 |
|
Isotr0py
|
3c713a9711
|
[Model] Cleanup InternViT's data parallel implementation (#25306)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-20 05:46:24 -07:00 |
|
Manoel Marques
|
bf8b26cad1
|
Generate _ModelInfo properties file when loading to improve loading speed (#23558)
Signed-off-by: Manoel Marques <manoel.marques@ibm.com>
Signed-off-by: Manoel Marques <manoelmrqs@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-20 11:51:13 +00:00 |
|
Wenlong Wang
|
032d661d27
|
[Docs] Fix warnings in mkdocs build (continued) (#25042)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-20 11:45:18 +00:00 |
|
Michael Goin
|
e08a3a3fdb
|
[CI Failure] Disable FlashInfer RoPE to unblock CI (#25299)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-20 08:16:56 +00:00 |
|
Cyrus Leung
|
3d9a1d2de5
|
[V1] Support LLM.apply_model (#18465)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 07:14:35 +00:00 |
|
Roger Wang
|
be874c0201
|
[Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-20 00:04:05 -07:00 |
|
Chen Zhang
|
9607d5eb44
|
[Hybrid Allocator] Support full attention with different hidden size (#25101)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-19 23:43:59 -07:00 |
|
Cyrus Leung
|
c60e6137f0
|
[Optimization] Avoid repeated model architecture conversion for pooling models (#25261)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 13:30:22 +08:00 |
|
Chauncey
|
f91480b2d4
|
[Bugfix] fix tool call arguments is empty (#25223)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: xin.li <xin.li@daocloud.io>
|
2025-09-20 13:29:54 +08:00 |
|
Chendi.Xue
|
6c5f82e5aa
|
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
|
2025-09-20 04:41:23 +00:00 |
|
Nick Hill
|
b7f186bbb3
|
[BugFix] Exclude self when checking for port collision (#25286)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-20 12:28:31 +08:00 |
|
JartX
|
3642909617
|
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-09-20 11:18:13 +08:00 |
|
Harry Mellor
|
c308501cb6
|
Improve weight loading for encoder models in Transformers backend (#25289)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-20 03:11:03 +00:00 |
|
Nick Hill
|
535d80056b
|
[Misc] Support more collective_rpc return types (#25294)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-20 02:02:38 +00:00 |
|
Nick Hill
|
a25ade5d47
|
[BugFix] Ensure appropriate guards in destructors (#25284)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-20 09:06:34 +08:00 |
|