Cyrus Leung
0ca2393b47
[CI/Build] Increase pooling tolerance to pass CI ( #22844 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-13 18:52:48 -04:00
Jialin Ouyang
31a500c86f
[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP ( #22437 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-08-13 14:44:06 -07:00
Isotr0py
df0e0f023e
[CI/Build] Skip gpt_big model test because of broken HF model ( #22848 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-13 20:36:28 +00:00
Cyrus Leung
b4b78d6317
[CI/Build] Fix param mismatch in test_eagle_correctness ( #22847 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-13 10:55:25 -07:00
Nicolò Lucchesi
12817a8ac7
[CI] Fix tests/v1/e2e/test_kv_sharing_fast_prefill.py import on test ( #22815 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-13 10:35:50 -07:00
Cyrus Leung
c9232d41f4
[CI/Build] Update VLM common tests ( #22841 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-13 10:03:05 -07:00
Cyrus Leung
19b927e52d
[Core] Use individual MM items in P0/P1 cache and model runner ( #22570 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-13 07:18:07 -07:00
Nicolò Lucchesi
6b794c756c
[Nixl][CI] Fix tests ( #22806 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-13 06:03:53 -07:00
Kdump
653124bd46
[Frontend] Add chunked processing to handle long inputs in embedding models ( #22280 )
...
Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Kdump <rootshellexp@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-13 04:14:24 -07:00
Giancarlo Delfin
d94e3026de
[V1] Add tree drafting tests for eagle spec decoding ( #22705 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
2025-08-13 04:11:28 -07:00
Duc-Viet Hoang
a01e0018b5
[Bugfix] Fix Nemotron VL image processing ( #22739 )
...
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
2025-08-13 03:11:36 -07:00
shixianc
4c558cf62e
[Perf] Support topk softmax fused kernel for broader num_experts ( #22211 )
...
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-08-12 21:34:47 -07:00
Woosuk Kwon
c5830381af
[V0 Deprecation] Remove args for multi-step scheduling ( #22779 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:38:18 -07:00
Woosuk Kwon
d31f97cf57
[Misc] Remove tests/multi_step/__init__.py ( #22778 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:21:18 -07:00
Woosuk Kwon
71683ca6f6
[V0 Deprecation] Remove multi-step scheduling ( #22138 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:18:39 -07:00
RUTHLESS-BOT
53c730286c
[Misc] parametrize 'dtype' in test_flash_mla ( #22641 )
...
Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-12 16:31:48 -04:00
Nicolò Lucchesi
422f22e012
[CI][Nixl] Check kv cache layout during handshake ( #22745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-12 12:53:52 -07:00
TeeKen Lau
c42fe0b63a
Add more test scenario for tensor schema ( #22733 )
...
Signed-off-by: teekenl <teekenlau@gmail.com>
2025-08-12 16:34:41 +00:00
Rahul Tuli
5a4b4b3729
Add: SupportsEagle3 interface for explicit EAGLE3 support ( #22642 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-08-12 09:24:52 -07:00
Nicolò Lucchesi
3d9d40efde
[Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle ( #22727 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-12 07:30:17 -07:00
Harry Mellor
80bb1e8afe
Officially support SmolLM3 using the Transformers backend ( #22665 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 05:38:48 -07:00
Yongye Zhu
007dd90859
[gpt-oss] Enable gpt-oss on ampere ( #22714 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-12 03:21:44 -07:00
RishiAstra
46ae7f6666
[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 ( #21783 )
...
Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>
2025-08-12 02:04:37 -07:00
phantomlei
bc8372efc3
[Bugfix] Fix erroneous randomly generated cases in bad word testing ( #22170 )
...
Signed-off-by: phantomlei <phantomlei3@gmail.com>
2025-08-12 02:03:22 -07:00
dongluw
9f909b8996
[New Model] Support Command-A-Vision ( #22660 )
...
Signed-off-by: donglu <donglu@cohere.com>
2025-08-12 01:39:54 -07:00
wang.yuqi
6d729c43fb
[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. ( #22637 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-12 00:23:17 -07:00
Michael Goin
93d0652433
[CI] Increase timeout for test_completion_with_image_embeds ( #22670 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-11 20:31:36 -07:00
Michael Goin
ea1292ad3e
[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py ( #22686 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-11 20:20:42 -07:00
Harry Mellor
839ab00349
Re-enable Xet on TPU tests now that hf_xet has been updated ( #22666 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 19:54:40 -07:00
Chen Zhang
1891a265d3
[gpt-oss] Add test for response API + harmony (but skipped) ( #22554 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 17:47:24 -07:00
TJian
65abe111a3
[CI] Skip Tree Attn Test in test_max_len.py to unblock CI ( #22664 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-11 10:36:05 -07:00
22quinn
807d21b80d
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI ( #22611 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-11 10:31:36 -07:00
Isotr0py
c90fb03df5
[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 ( #22659 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-11 10:00:58 -07:00
wang.yuqi
84cf78acee
[Model] Pooling models default to using chunked prefill & prefix caching if supported. ( #20930 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-11 09:41:37 -07:00
GuanLuo
16fb668b61
fix: NIXL connector transfers partial block to pass full multi-modal context ( #21074 )
...
Signed-off-by: GuanLuo <gluo@nvidia.com>
2025-08-11 09:40:55 -07:00
Wentao Ye
f7dcce7a4a
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale ( #21968 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-11 09:39:08 -07:00
Cyrus Leung
ebf7605b0d
[Misc] Move tensor schema tests ( #22612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-11 00:15:27 -07:00
Maximilien de Bayser
39052dbca8
Support token_type_ids in V1 with less code changes ( #21985 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-10 22:54:59 -07:00
Nick Hill
5898b135ab
[BugFix] Fix KVConnectorOutput TPU breakage ( #22598 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-10 19:33:48 -07:00
22quinn
b799f4b9ea
[CI/Build] Fix tensorizer test for load_format change ( #22583 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 19:30:00 -07:00
Benji Beck
68b254d673
Fix TensorSchema validation test for symbolic dims ( #22366 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-10 17:16:44 +00:00
Isotr0py
b76753f0b5
[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel ( #22593 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-10 09:00:36 -07:00
Isotr0py
049c245143
[Misc] Replace flaky image urls in pixtral test ( #22574 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-10 06:18:21 -07:00
Ning Xie
326976291b
[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config ( #22566 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-10 00:08:48 -07:00
Harry Mellor
c49848396d
Refactor sliding window configuration to Transformers best practice ( #21927 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00
Chengji Yao
2a84fb422f
[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block ( #22394 )
...
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
2025-08-09 20:49:04 -07:00
Le Chen
3d7363e61c
[Config] add "qwen" as a native eagle3 target supported model ( #22333 )
...
Signed-off-by: lechen <lecself@163.com>
Signed-off-by: LeChen <lecself@163.com>
2025-08-09 20:21:05 -07:00
Thomas Parnell
61f67d8acd
[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers ( #21401 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-09 20:16:11 -07:00
TJian
42172ad18f
[FEAT] [Performance] Add triton mrope to replace the torch code path ( #22375 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-09 11:50:03 -07:00
Nicolò Lucchesi
5a16fa614c
[Model] Gemma3n MM ( #20495 )
...
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-09 09:56:25 -07:00