8483 Commits

Author SHA1 Message Date
Chen Zhang
ad344ef552
[gpt-oss] Small bug fixes for frontend (#22512)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 22:04:38 -07:00
Chen Zhang
bbaf9e9cb1
[gpt-oss] Fix mxfp4 support (#22700)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 21:22:26 -07:00
Benji Beck
4678503476
Migrate MiniCPMVImageInputs to TensorSchema (#21939)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-11 20:43:37 -07:00
Michael Goin
93d0652433
[CI] Increase timeout for test_completion_with_image_embeds (#22670)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-11 20:31:36 -07:00
Michael Goin
ea1292ad3e
[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-11 20:20:42 -07:00
Po-Han Huang (NVIDIA)
dc5e4a653c
Upgrade FlashInfer to v0.2.11 (#22613)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-11 19:58:41 -07:00
Harry Mellor
839ab00349
Re-enable Xet on TPU tests now that hf_xet has been updated (#22666)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 19:54:40 -07:00
Andy Chen
9b94d6ec8f
Enable 4bit bnb prequant MOE (#21548)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-11 19:02:14 -07:00
Chen Zhang
1891a265d3
[gpt-oss] Add test for response API + harmony (but skipped) (#22554)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 17:47:24 -07:00
Chen Zhang
95a935fc48
[gpt-oss] Support streaming in response API (#22431)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 17:46:59 -07:00
Harry Mellor
458e74eb90
Support more parallel styles in Transformers backend TP (#22651)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 10:42:48 -07:00
TJian
65abe111a3
[CI] Skip Tree Attn Test in test_max_len.py to unblock CI (#22664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-08-11 10:36:05 -07:00
22quinn
807d21b80d
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-11 10:31:36 -07:00
Isotr0py
c90fb03df5
[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 (#22659)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-11 10:00:58 -07:00
wang.yuqi
84cf78acee
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-11 09:41:37 -07:00
GuanLuo
16fb668b61
fix: NIXL connector transfers partial block to pass full multi-modal context (#21074)
Signed-off-by: GuanLuo <gluo@nvidia.com>
2025-08-11 09:40:55 -07:00
Wentao Ye
f7dcce7a4a
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale (#21968)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-11 09:39:08 -07:00
Isotr0py
8e13d9fe6d
[Misc] Further clean up some redundant config definitions (#22649)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-11 09:22:25 -07:00
Eric Curtin
3fa5b25845
Document aarch64 CPU support works (#22646)
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-08-11 07:22:45 -07:00
danielafrimi
14a5d903ab
[Model] NemotronH Support (#22349)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-11 04:09:24 -07:00
Cyrus Leung
951b038298
[Misc] Move jsontree to utils (#22622)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-11 03:49:32 -07:00
Cyrus Leung
ebf7605b0d
[Misc] Move tensor schema tests (#22612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-11 00:15:27 -07:00
Harry Mellor
bc1d02ac85
[Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 00:13:33 -07:00
JartX
1e55dfa7e5
[BUGFIX] KeyError 'layers.14.mlp.gate.g_idx' for Qwen3-MoE with GPTQ on ROCm (#22017) 2025-08-11 00:13:30 -07:00
Jee Jee Li
384a052971
[Misc] benchmark_moe supports expert parallel (#22251)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-11 00:13:27 -07:00
Maximilien de Bayser
39052dbca8
Support token_type_ids in V1 with less code changes (#21985)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-10 22:54:59 -07:00
vllmellm
9c97a1c349
[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (#22521)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-08-10 22:52:34 -07:00
Eugene Cheah
f919d4cb8f
[BugFix] Fix logits repetition penalty cuda check (#22592) 2025-08-10 22:52:31 -07:00
Zhewen Li
afa5b7ca0b
[Misc][gpt-oss] guard import when triton kernel when not up to date (#22584)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-08-10 21:29:35 -07:00
Lifans
1b99028069
[Misc][gpt-oss] Add rules to label gpt-oss related PRs (#22600)
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-08-10 19:49:51 -07:00
Nick Hill
5898b135ab
[BugFix] Fix KVConnectorOutput TPU breakage (#22598)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-10 19:33:48 -07:00
22quinn
b799f4b9ea
[CI/Build] Fix tensorizer test for load_format change (#22583)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 19:30:00 -07:00
Benji Beck
06da44f0cb
Migrate LlavaImageInputs to TensorSchema (#21770)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-10 19:29:19 -07:00
Benji Beck
a554991748
Migrate LlavaNextVideoPixelInputs to TensorSchema (#21843)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-10 19:29:16 -07:00
Doug Smith
d1af8b7be9
enable Docker-aware precompiled wheel setup (#22106)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-08-10 16:29:02 -07:00
Benji Beck
68b254d673
Fix TensorSchema validation test for symbolic dims (#22366)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-10 17:16:44 +00:00
ZiTian Zhao
8c50d62f5a
Remove redundant row_indices unsqueeze operation in MiniCPMO (#22528)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
2025-08-10 09:20:00 -07:00
Benji Beck
b4e2916721
Migrate LlavaNextImageInputs to TensorSchema (#21774)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-10 09:05:21 -07:00
Breno Baldas Skuk
65a7917be4
Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks (#22534)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
2025-08-10 09:03:15 -07:00
Isotr0py
b76753f0b5
[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-10 09:00:36 -07:00
youkaichao
b81fe83b2c
[doc] add alibaba cloud as sponsor (#22597)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-10 23:13:47 +08:00
youkaichao
0757551c96
[doc] add beijing meetup links (#22596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-10 22:51:36 +08:00
Harry Mellor
8290d15d2c
Move CacheConfig from config/__init__.py to config/cache.py (#22586)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-10 07:36:40 -07:00
Isotr0py
049c245143
[Misc] Replace flaky image urls in pixtral test (#22574)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-10 06:18:21 -07:00
Harry Mellor
00976db0c3
[Docs] Fix warnings in docs build (#22588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-10 05:49:51 -07:00
Cyrus Leung
d411df0296
[Misc] Further refine type annotations in parallel state (#22499)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-10 05:49:48 -07:00
22quinn
010e0e39ea
[Doc] Fix API doc link in side navigation (#22585)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 01:35:22 -07:00
Ning Xie
326976291b
[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-10 00:08:48 -07:00
Isotr0py
7e8d685775
[Minor] Fix pre-commit error on main (#22579)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-10 00:08:23 -07:00
Harry Mellor
c49848396d
Refactor sliding window configuration to Transformers best practice (#21927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00