11059 Commits

Author SHA1 Message Date
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading (#26066)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-11-04 06:00:57 -05:00
yugong333
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904) 2025-11-04 15:56:21 +08:00
Zhewen Li
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI (#28022)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-04 07:20:50 +00:00
CSWYF3634076
43a6acfb7d
[Model] fix ernie45 reasoning_parser (#27973)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-11-04 07:16:46 +00:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-03 23:00:49 -08:00
Zhewen Li
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI (#27944)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-04 06:36:40 +00:00
xiangze-arm
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance (#27240)
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-11-04 06:33:23 +00:00
Wentao Ye
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) (#27884) 2025-11-04 14:05:55 +08:00
Mark McLoughlin
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode (#27867)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-03 20:35:36 -08:00
liuzhenwei
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 (#27849)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-11-04 03:28:35 +00:00
Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser (#27974)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-04 10:10:10 +08:00
li2haipeng
6ddae74054
[LoRA] Lora shrink swizzle (#27694)
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
Signed-off-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-04 09:30:20 +08:00
vllmellm
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-03 17:12:19 -08:00
QiliangCui
7956b0c0bc
Remove the tpu docker image nightly build. (#27997)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-11-04 00:35:54 +00:00
Tyler Michael Smith
3758757377
[Bugfix] Fix MoE Routing Simulation (#28002)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-11-03 22:26:49 +00:00
Hank_
ccd3e55e51
[Bugfix][plugin] fla crash on plugin (#27322) 2025-11-04 05:27:03 +08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests (#27683)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 13:04:40 -08:00
Ning Xie
786030721e
[Docs] add runai_streamer_sharded to LoadConfig (#27937)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-03 20:35:16 +00:00
Matthew Bonanni
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold (#27777)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 15:17:10 -05:00
Lucas Kabela
55011aef24
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-11-03 11:12:15 -08:00
Sophie du Couédic
a4398fbb5e
[Feature][Benchmarks] Support inf burstiness (#26941)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
2025-11-03 18:33:17 +00:00
Aurick Qiao
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
2025-11-03 09:23:31 -08:00
Lucas Wilkinson
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test (#27235)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-03 17:00:46 +00:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-11-03 11:13:51 -05:00
pwschuurman
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection (#27846)
Signed-off-by: Peter Schuurman <psch@google.com>
2025-11-03 14:31:03 +00:00
gnovack
294c805f1d
Early exit for MoE LoRA kernels (#27131)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-03 20:22:17 +08:00
zhang-prog
40b69e33e7
[Model] Add PaddleOCR-VL Model Support (#27758)
Signed-off-by: zhangyue <zhangyue66@baidu.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-03 19:04:22 +08:00
Jee Jee Li
32257297dd
[CI/Build] Remove the flaky gpt-oss lora test (#27966)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-03 16:50:06 +08:00
Misha Efimov
ba464e6ae2
Add ORCA endpoint load metrics support (#24905)
Signed-off-by: Misha Efimov <mef@google.com>
2025-11-03 08:21:31 +00:00
Kunshang Ji
7f4bdadb92
[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (#27964)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-03 07:36:59 +00:00
Rémi Delacourt
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-03 02:22:46 -05:00
Thomas Parnell
18961c5ea6
[Hybrid] Pass kernel block size to builders (#27753)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-11-03 05:48:03 +00:00
Sungyoon Jeong
470ad118b6
[Frontend] Align finish_reason when tool is called with OpenAI (#25054)
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-03 04:21:18 +00:00
Biswa Panda
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request (#27728)
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
2025-11-03 10:08:08 +08:00
Vensen
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420)
Signed-off-by: vensenmu <vensenmu@gmail.com>
2025-11-02 16:24:01 +00:00
Cyrus Leung
6c317a656e
[Misc] Provide Siglip2 chat template (#27939)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-11-02 04:16:23 -08:00
Julien Denize
73444b7b56
Performance fix MistralTokenizer: cache special ids and tokens (#27925)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-11-02 08:48:33 +00:00
Cyrus Leung
853a8eb53b
[Bugfix] Fix Qwen Omni audio inference (#27920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-02 05:06:05 +00:00
Ben Browning
758ea2e980
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma (#27924)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-11-02 03:45:02 +00:00
Yue Zhang
685c99ee77
[KV offload] Offloading connector async scheduling support (#27648)
Signed-off-by: KevinCheung2259 <2651309292@qq.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-11-01 21:08:56 +00:00
Benjamin Bartels
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server (#27882)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
2025-11-01 12:45:42 -07:00
Nick Hill
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling (#27910)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-01 10:51:24 -07:00
wenxindongwork
af6e19f50f
[Core][TPU] Support TPU Data Parallalism (#27365)
Signed-off-by: wenxindongwork <wenxindong@google.com>
2025-11-01 17:14:44 +00:00
Cyrus Leung
99d69af9ec
[Bugfix] Python 3.10 compatibility for Self (#27918)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-01 15:28:54 +00:00
Haco
d811b442d3
[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues (#26779)
Signed-off-by: xiaohajiayou <923390377@qq.com>
2025-11-01 10:52:43 -04:00
wangxiyuan
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module (#27798)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-01 10:17:45 +00:00
Harry Mellor
799ce45cc1
[Docs] Mock all imports for docs (#27873)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-01 10:02:23 +00:00
ai-jz
2c0c7c39bd
feat(benchmarks): support HF model names in multi-turn benchmark (#27850) 2025-11-01 08:04:52 +00:00