Shengqi Chen
36db0a35e4
[CI] Renovation of nightly wheel build & generation ( #29690 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-01 21:25:39 +08:00
Marcin Ostrowski
5cfa967efa
[Bugfix] TypeError: 'NoneType' object is not callable ( #29414 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
2025-12-01 13:16:44 +00:00
Isotr0py
b95db244ee
[v1] Add real sliding window calculation to FlexAttention direct BlockMask building ( #26015 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-01 13:12:51 +00:00
Zhengxu Chen
ad9d656bfa
[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM ( #29504 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-01 20:41:48 +08:00
Fanli Lin
f37e8938d2
[XPU] Fix AWQ skipped layer detection in IPEX quantization ( #29774 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-12-01 12:00:52 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
Mickaël Seznec
86e178f7c4
[crashfix] Eagle + multimodal can crash on mm cache miss ( #29750 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-12-01 17:29:33 +08:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer ( #29224 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-01 08:03:57 +00:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints ( #29634 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-01 15:30:43 +08:00
Huamin Li
83805a6078
[CI] Skip paddleocr_vl for transformer 4.57.3 ( #29758 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-12-01 04:38:06 +00:00
Yifei Zhang
1ab8fc8197
Make PyTorch profiler gzip and CUDA time dump configurable ( #29568 )
...
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
2025-12-01 04:30:46 +00:00
Shu Wang
f72a817bdf
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch ( #27141 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-30 16:05:32 -08:00
Woosuk Kwon
ec38a7368d
[Model Runner V2] Use packed mask for prompt bin counts ( #29756 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-30 14:15:42 -08:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig ( #29749 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 17:14:23 +00:00
Omer Ullman Argov
39d28108f4
[Feat] Support non-gated activations in NVFP4 modelopt path ( #29004 )
2025-11-30 11:02:40 -05:00
Harry Mellor
cd719de5cb
Fix RoPE failures in Transformers nightly ( #29700 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-30 14:29:32 +00:00
Pleaplusone
8c363ed666
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend ( #29234 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-30 11:31:50 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size ( #29741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 17:31:12 +08:00
Isotr0py
47539cfd3e
[Bugfix] Fix mismatched nvfp4 gemm output shape ( #29742 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-30 09:15:01 +00:00
Cyrus Leung
2afcec4dec
[Misc] Update TokenizerLike interface and move get_cached_tokenizer ( #29730 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-30 14:59:47 +08:00
朝
9381b5cde0
[Doc]: Fix typo in fused_moe layer ( #29731 )
...
Signed-off-by: BowTen <bowten@qq.com>
2025-11-29 22:29:13 -08:00
Vensen
66b5840287
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output ( #28783 )
...
Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-11-30 14:24:25 +08:00
Huamin Li
82c795d6f2
Fix AttributeError about _use_fi_prefill ( #29734 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-30 06:04:55 +00:00
Isotr0py
e1464c3a08
[Quantization] Enable compressed-tensors AWQ for Turing GPU ( #29732 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-30 06:04:28 +00:00
Xin Yang
a491b0911b
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #29708 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-30 10:37:25 +08:00
Jee Jee Li
b9d0504a36
[Bugfix] Revert test_tokenization.py ( #29729 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-29 16:35:15 +00:00
Jinzhen Lin
1656ad3704
[Kernel][Quantization] add w4a8 support for marlin kernel ( #24722 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-11-29 07:19:33 -08:00
Cyrus Leung
fa59fe417f
[Chore] Move detokenizer_utils to vllm/tokenizers ( #29727 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:17 -08:00
Cyrus Leung
fe3398fab2
[Chore] Enable passing tokenizer=None into MM processor ( #29724 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:10 -08:00
Chukwuma Nwaugha
ad7f714d62
hfrunner.classify should return list[list[float]] not list[str] ( #29671 )
...
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
2025-11-29 13:57:00 +00:00
dublc
f4341f45d3
[Doc]: fix code block rendering ( #29728 )
...
Signed-off-by: dublc <jdublc0x@gmail.com>
2025-11-29 13:46:48 +00:00
Cyrus Leung
34a984274e
[Misc] Refactor tokenizer interface ( #29693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Woosuk Kwon
f223ed4181
[Model Runner V2] Fuse penalties and temperature into single kernel ( #29720 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-29 02:29:16 -08:00
Didier Durand
04a797cd0e
[Doc]: fixing typos in various files. ( #29717 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-29 01:15:39 -08:00
Woosuk Kwon
6afc0ffaf6
[Model Runner V2] Add sample/ directory and reorganize files ( #29719 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-29 00:41:01 -08:00
Jee Jee Li
39e63dec7c
[LoRA] Cleanup LoRA unused code ( #29611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 22:52:58 -08:00
Woosuk Kwon
4a80ad0a25
[Model Runner V2] Don't use UVA buffer for prefill_len ( #29713 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 20:27:16 -08:00
Angela Yi
4b17ce6815
Add gpu memory wait before test_async_tp ( #28893 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-28 20:19:05 -08:00
Lucas Wilkinson
e23f665d83
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable ( #29698 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-28 20:19:01 -08:00
Woosuk Kwon
ca1b1e7296
[Model Runner V2] Refactor prefill token preparation ( #29712 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 19:49:17 -08:00
Tsukasa OI
762a4a6ca9
[Frontend] Perform offline path replacement to tokenizer ( #29706 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-11-28 18:32:08 -08:00
Cyrus Leung
b2c50eda50
[Bugfix] Fix wrong mock attribute ( #29704 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 10:30:41 +08:00
Woosuk Kwon
1dcafb3dea
[Model Runner V2] Support penalties using bin counts ( #29703 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-28 17:53:17 -08:00
Andreas Karatzas
ea3370b428
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group ( #29702 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-11-29 01:31:44 +00:00
Mert Unsal
c625d7b1c6
[Bugfix] Fix O(n²) multimodal string prompt processing ( #29667 )
...
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 16:10:39 -08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. ( #29696 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-29 07:58:38 +08:00
Augusto Yao
9726e64530
bugfix: correct attn output with base 2 or e ( #28840 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2025-11-29 07:52:12 +08:00
Huamin Li
3fd1fb0b60
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #28971 )" ( #29697 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-28 15:26:52 -08:00
Jiangyun Zhu
a51f4186f2
[Bugfix] fix dots.llm1.inst ( #29687 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 15:25:26 -08:00
Cyrus Leung
7675ba30de
[Misc] Remove redundant ClassRegistry ( #29681 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-28 15:24:47 -08:00