11815 Commits

Author SHA1 Message Date
Shengqi Chen
36db0a35e4
[CI] Renovation of nightly wheel build & generation (#29690)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-01 21:25:39 +08:00
Marcin Ostrowski
5cfa967efa
[Bugfix] TypeError: 'NoneType' object is not callable (#29414)
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
2025-12-01 13:16:44 +00:00
Isotr0py
b95db244ee
[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-01 13:12:51 +00:00
Zhengxu Chen
ad9d656bfa
[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM (#29504)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-01 20:41:48 +08:00
Fanli Lin
f37e8938d2
[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-12-01 12:00:52 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration (#29767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
Mickaël Seznec
86e178f7c4
[crashfix] Eagle + multimodal can crash on mm cache miss (#29750)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-12-01 17:29:33 +08:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer (#29224)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-01 08:03:57 +00:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-01 15:30:43 +08:00
Huamin Li
83805a6078
[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-12-01 04:38:06 +00:00
Yifei Zhang
1ab8fc8197
Make PyTorch profiler gzip and CUDA time dump configurable (#29568)
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
2025-12-01 04:30:46 +00:00
Shu Wang
f72a817bdf
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-30 16:05:32 -08:00
Woosuk Kwon
ec38a7368d
[Model Runner V2] Use packed mask for prompt bin counts (#29756)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-30 14:15:42 -08:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 17:14:23 +00:00
Omer Ullman Argov
39d28108f4
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004) 2025-11-30 11:02:40 -05:00
Harry Mellor
cd719de5cb
Fix RoPE failures in Transformers nightly (#29700)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 14:29:32 +00:00
Pleaplusone
8c363ed666
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234)
Signed-off-by: ganyi <ygan@amd.com>
2025-11-30 11:31:50 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 17:31:12 +08:00
Isotr0py
47539cfd3e
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-30 09:15:01 +00:00
Cyrus Leung
2afcec4dec
[Misc] Update TokenizerLike interface and move get_cached_tokenizer (#29730)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 14:59:47 +08:00
9381b5cde0
[Doc]: Fix typo in fused_moe layer (#29731)
Signed-off-by: BowTen <bowten@qq.com>
2025-11-29 22:29:13 -08:00
Vensen
66b5840287
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783)
Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-11-30 14:24:25 +08:00
Huamin Li
82c795d6f2
Fix AttributeError about _use_fi_prefill (#29734)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-30 06:04:55 +00:00
Isotr0py
e1464c3a08
[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-30 06:04:28 +00:00
Xin Yang
a491b0911b
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-30 10:37:25 +08:00
Jee Jee Li
b9d0504a36
[Bugfix] Revert test_tokenization.py (#29729)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-29 16:35:15 +00:00
Jinzhen Lin
1656ad3704
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-11-29 07:19:33 -08:00
Cyrus Leung
fa59fe417f
[Chore] Move detokenizer_utils to vllm/tokenizers (#29727)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:17 -08:00
Cyrus Leung
fe3398fab2
[Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:10 -08:00
Chukwuma Nwaugha
ad7f714d62
hfrunner.classify should return list[list[float]] not list[str] (#29671)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
2025-11-29 13:57:00 +00:00
dublc
f4341f45d3
[Doc]: fix code block rendering (#29728)
Signed-off-by: dublc <jdublc0x@gmail.com>
2025-11-29 13:46:48 +00:00
Cyrus Leung
34a984274e
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Woosuk Kwon
f223ed4181
[Model Runner V2] Fuse penalties and temperature into single kernel (#29720)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-29 02:29:16 -08:00
Didier Durand
04a797cd0e
[Doc]: fixing typos in various files. (#29717)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-29 01:15:39 -08:00
Woosuk Kwon
6afc0ffaf6
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-29 00:41:01 -08:00
Jee Jee Li
39e63dec7c
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 22:52:58 -08:00
Woosuk Kwon
4a80ad0a25
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 20:27:16 -08:00
Angela Yi
4b17ce6815
Add gpu memory wait before test_async_tp (#28893)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-28 20:19:05 -08:00
Lucas Wilkinson
e23f665d83
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-28 20:19:01 -08:00
Woosuk Kwon
ca1b1e7296
[Model Runner V2] Refactor prefill token preparation (#29712)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 19:49:17 -08:00
Tsukasa OI
762a4a6ca9
[Frontend] Perform offline path replacement to tokenizer (#29706)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-11-28 18:32:08 -08:00
Cyrus Leung
b2c50eda50
[Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 10:30:41 +08:00
Woosuk Kwon
1dcafb3dea
[Model Runner V2] Support penalties using bin counts (#29703)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 17:53:17 -08:00
Andreas Karatzas
ea3370b428
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-11-29 01:31:44 +00:00
Mert Unsal
c625d7b1c6
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 16:10:39 -08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. (#29696)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-29 07:58:38 +08:00
Augusto Yao
9726e64530
bugfix: correct attn output with base 2 or e (#28840)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2025-11-29 07:52:12 +08:00
Huamin Li
3fd1fb0b60
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-28 15:26:52 -08:00
Jiangyun Zhu
a51f4186f2
[Bugfix] fix dots.llm1.inst (#29687)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 15:25:26 -08:00
Cyrus Leung
7675ba30de
[Misc] Remove redundant ClassRegistry (#29681)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-28 15:24:47 -08:00