4476 Commits

Author SHA1 Message Date
Dhia Eddine Rhaiem
eca18691d2
[MODEL] FalconH1 (#18406)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
2025-05-21 04:59:06 -07:00
Yong Hoon Shin
907f935de9
[V1] Fix general plugins not loaded in engine for multiproc (#18326)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-05-21 01:21:49 -07:00
Kebe
5d7f545204
[Frontend] deprecate --device arg (#18399)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-05-21 01:21:17 -07:00
Nicolò Lucchesi
cd8dfc6dfc
[Misc] MultiConnector._connectors type (#18423)
Signed-off-by: nicklucche <nlucches@redhat.com>
2025-05-20 22:48:43 -07:00
wwl2755
d06dd72ba9
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size (#18429)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-05-20 22:41:44 -07:00
Cyrus Leung
ad0012a0ac
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)" (#18456)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-20 22:39:22 -07:00
Gregory Shtrasberg
0c15c2e486
[Bugfix] config.head_dim is now explicitly set to None (#18432)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-20 21:04:33 -07:00
Michael Goin
3b17ea26e4
[TPU] Re-enable the Pallas MoE kernel (#18025)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-05-20 19:52:27 -07:00
Percy
980a172474
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-05-20 11:19:34 -07:00
Calvin Chen
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-20 09:40:05 -07:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
2025-05-20 06:59:48 -07:00
Random Fly
bca55b556f
[Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
Signed-off-by: rand-fly <randfly@outlook.com>
2025-05-20 00:54:33 -07:00
Nan Qin
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
2025-05-19 20:21:27 -07:00
Isotr0py
f07a673eb2
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-19 20:20:12 -07:00
Satyajith Chilappagari
dc1440cf9f
Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
2025-05-19 09:54:47 -07:00
sunyicode0012
7937c2fd52
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337) 2025-05-19 09:49:57 -07:00
Wenhua Cheng
e2ee1e8e9e
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
2025-05-19 09:38:53 -07:00
Reid
20d8ce81eb
[Frontend] add --quick option for vllm chat/complete (#18297)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-19 09:36:13 -07:00
Jee Jee Li
6781af5608
[Quantization] Pool model support bitsandbytes (#18087)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-19 09:03:43 -07:00
Nick Hill
1b15df2546
[BugFix] Fix handling of num_computed_tokens with connector (#18232)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-05-19 09:03:25 -07:00
Shaoyu Yang
d637b96099
[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS (#18319)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com>
Co-authored-by: cascade <cascade812@outlook.com>
2025-05-19 01:31:23 -07:00
CYJiang
275c5daeb0
fix: Add type specifications for CLI arguments in tensorizer options (#18314) 2025-05-18 23:42:17 -07:00
Nan Qin
221cfc2fea
Feature/vllm/input embedding completion api (#17590)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-18 20:18:05 -07:00
wwl2755
9da1095daf
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-05-18 19:49:46 -07:00
Lifu Huang
4fb349f66a
Fix copy-paste error in phi4mm image processing (#18315)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-18 07:00:12 -07:00
22quinn
908733aca7
[Model] Use sigmoid for single-label classification (#18313)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-05-18 07:00:09 -07:00
cascade
9ab2c02ff8
Support sequence parallelism combined with pipeline parallelism (#18243)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-17 22:47:25 +00:00
Ning Xie
66e63e86ec
[MISC] fix typo (#18305)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-17 10:52:09 -07:00
rongfu.leng
9214e60631
[Model] use AutoWeightsLoader for solar (#18113) 2025-05-17 00:24:17 -07:00
Siyuan Liu
48ac2bed5b
[Hardware][TPU] Optionally import for TPU backend (#18269)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Carol Zheng <cazheng@google.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Hongmin Fan <fanhongmin@google.com>
2025-05-17 15:23:12 +08:00
David Ben-David
3e0d435027
[P/D][V1] Support dynamic loading of external KV connector implementations (#18142)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-05-17 06:40:39 +00:00
汪志鹏
4ee4826ede
[BugFix] Correct max_model_len derivation from config.json for Mistral format (#17937)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: tracelogfb <48808670+tracelogfb@users.noreply.github.com>
Co-authored-by: Stephen Chen <tracelog@meta.com>
2025-05-17 04:20:13 +00:00
Reid
60017dc841
[Misc] reformat the collect-env output (#18285)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-16 19:46:18 -07:00
Michael Goin
fd195b194e
[V1][P/D] Local attention optimization for NIXL (#18170)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-16 21:16:33 -04:00
Woosuk Kwon
fabe89bbc4
[Spec Decode] Don't fall back to V0 when spec decoding is enabled (#18265) 2025-05-16 16:10:27 -07:00
Bowen Wang
7fdfa01530
[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-05-16 15:14:03 -07:00
Nick Hill
0ceaebf87b
[BugFix] Fix ordering of KVConnector finished send/rcv sets (#18211)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-16 09:20:54 -07:00
Nick Hill
1db4f47f81
[BugFix] Fix multi async save in MultiConnector (#18246)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-16 08:13:47 -07:00
Reid
d3d91b6f71
[Misc][MacOS] fix bfloat16 error (#18249)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-16 15:05:59 +00:00
learner0810
87d871470d
[Model] Use autoweightloader for dbrx (#18251)
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
2025-05-16 07:54:13 -07:00
fxmarty-amd
a5f8c111c2
[Fix] Fix typo in resolve_hf_chat_template (#18259)
Signed-off-by: Felix Marty <felmarty@amd.com>
2025-05-16 14:52:41 +00:00
Lain
e23564cb70
use ceil_div in cutlass block scaling shape check (#17918) 2025-05-16 03:02:58 -07:00
Seiji Eicher
541817670c
[Misc] Add Ray Prometheus logger to V1 (#17925)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-05-16 01:02:42 -07:00
Vadim Gimpelson
67da5720d4
[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-05-15 23:31:02 -07:00
Lucia Fang
3d2779c29a
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
2025-05-15 22:28:27 -07:00
Will Eaton
6b31c84aff
Throw better error for when running into k8s service discovery issue (#18209)
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-05-15 21:07:28 -07:00
Harry Mellor
b18201fe06
Allow users to pass arbitrary JSON keys from CLI (#18208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-15 21:05:34 -07:00
Sky Lee
f4937a51c1
[Model] vLLM v1 supports Medusa (#17956)
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com>
Signed-off-by: skylee-01 <497627264@qq.com>
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>
2025-05-15 21:05:31 -07:00
kliuae
ee659e3b60
[Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm (#18093)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
2025-05-15 19:30:17 -07:00