Ning Xie
|
176d62e4ea
|
[MISC] update project urls in pyproject.toml (#18519)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-21 20:17:34 -07:00 |
|
Dhia Eddine Rhaiem
|
20bd6f4d2e
|
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
|
2025-05-21 19:23:59 -07:00 |
|
Sebastian Schoennenbeck
|
1f079540db
|
[Bugfix] Consistent ascii handling in tool parsers (#17704)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
|
2025-05-21 20:41:23 +00:00 |
|
vllmellm
|
94d8ec8d2b
|
[FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-05-21 10:34:28 -07:00 |
|
Mark McLoughlin
|
bb0a311213
|
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-21 10:25:23 -07:00 |
|
Hosang
|
dd5fa7e04f
|
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-05-21 08:35:00 -07:00 |
|
Hyogeun Oh (오효근)
|
2b16104557
|
[Misc] Update deprecation message for --enable-reasoning (#18404)
|
2025-05-21 07:33:11 -07:00 |
|
Kebe
|
371376f996
|
[Build] fix Dockerfile shell (#18402)
|
2025-05-21 07:32:06 -07:00 |
|
bnellnm
|
c6c10ca920
|
[Bugfix] Reduce moe_sum test size to avoid OOM (#18484)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-21 06:46:39 -07:00 |
|
GiantCroc
|
c154d89306
|
[Doc] fix arg docstring in linear layers (#18410)
Signed-off-by: giantcroc <1204449533@qq.com>
|
2025-05-21 06:45:57 -07:00 |
|
Dhia Eddine Rhaiem
|
eca18691d2
|
[MODEL] FalconH1 (#18406)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
|
2025-05-21 04:59:06 -07:00 |
|
Rabi Mishra
|
61acfc45bc
|
[Bugfix][Failing Test] Fix test_events.py (#18460)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-21 04:57:28 -07:00 |
|
Reid
|
107f5fc4cb
|
[Misc] refactor disaggregated-prefill-v1 example (#18474)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-21 11:10:14 +00:00 |
|
Yong Hoon Shin
|
907f935de9
|
[V1] Fix general plugins not loaded in engine for multiproc (#18326)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-21 01:21:49 -07:00 |
|
Kebe
|
5d7f545204
|
[Frontend] deprecate --device arg (#18399)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-21 01:21:17 -07:00 |
|
Nicolò Lucchesi
|
cd8dfc6dfc
|
[Misc] MultiConnector._connectors type (#18423)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-05-20 22:48:43 -07:00 |
|
wwl2755
|
d06dd72ba9
|
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size (#18429)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-05-20 22:41:44 -07:00 |
|
Cyrus Leung
|
ad0012a0ac
|
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)" (#18456)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-20 22:39:22 -07:00 |
|
bnellnm
|
92247c522e
|
[Bug] Fix moe_sum signature (#18440)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-20 22:37:08 -07:00 |
|
Gregory Shtrasberg
|
0c15c2e486
|
[Bugfix] config.head_dim is now explicitly set to None (#18432)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-20 21:04:33 -07:00 |
|
Michael Goin
|
3b17ea26e4
|
[TPU] Re-enable the Pallas MoE kernel (#18025)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-05-20 19:52:27 -07:00 |
|
Dilip Gowda Bhagavan
|
23baa2180b
|
fix:Build torch wheel inline rather than picking from nightly (#18351)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
|
2025-05-20 22:22:24 +00:00 |
|
Percy
|
980a172474
|
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
|
2025-05-20 11:19:34 -07:00 |
|
Calvin Chen
|
e1f5a71ed7
|
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-05-20 09:40:05 -07:00 |
|
Michael Goin
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
Reid
|
8f55962a7f
|
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-20 15:26:12 +00:00 |
|
燃
|
be48360c1f
|
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
|
2025-05-20 06:59:48 -07:00 |
|
wang.yuqi
|
86847700d7
|
[CI] Add mteb testing to test the accuracy of the embedding model (#17175)
|
2025-05-20 06:51:12 -07:00 |
|
汪志鹏
|
d6c86d09ae
|
Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-05-20 10:53:23 +00:00 |
|
Jee Jee Li
|
6b35cb10a0
|
[Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-20 03:27:30 -07:00 |
|
Reid
|
1b1e8e05ff
|
[doc] update env variable export (#18391)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-20 08:53:27 +00:00 |
|
Random Fly
|
bca55b556f
|
[Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
Signed-off-by: rand-fly <randfly@outlook.com>
|
2025-05-20 00:54:33 -07:00 |
|
Kevin H. Luu
|
d981396778
|
[release] Change dockerhub username for TPU release (#18389)
|
2025-05-19 23:49:23 -07:00 |
|
Nan Qin
|
9609327fa4
|
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
|
2025-05-19 20:21:27 -07:00 |
|
Isotr0py
|
f07a673eb2
|
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-19 20:20:12 -07:00 |
|
Liangfu Chen
|
d565e0976f
|
[neuron] fix authorization issue (#18364)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-05-19 23:30:32 +00:00 |
|
Lucia Fang
|
258bf621d5
|
fix CUDA_check redefinition in #17918 (#18287)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-05-19 13:42:35 -07:00 |
|
Satyajith Chilappagari
|
dc1440cf9f
|
Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-19 09:54:47 -07:00 |
|
Gong Shufan
|
8171221834
|
[Misc] Fix typo (#18330)
|
2025-05-19 09:51:01 -07:00 |
|
sunyicode0012
|
7937c2fd52
|
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337)
|
2025-05-19 09:49:57 -07:00 |
|
Wenhua Cheng
|
e2ee1e8e9e
|
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-19 09:38:53 -07:00 |
|
Reid
|
20d8ce81eb
|
[Frontend] add --quick option for vllm chat/complete (#18297)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-19 09:36:13 -07:00 |
|
Elad Segal
|
84ab4feb7e
|
[Doc] Fix typo (#18355)
|
2025-05-19 16:05:16 +00:00 |
|
Jee Jee Li
|
6781af5608
|
[Quantization] Pool model support bitsandbytes (#18087)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-19 09:03:43 -07:00 |
|
Nick Hill
|
1b15df2546
|
[BugFix] Fix handling of num_computed_tokens with connector (#18232)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-05-19 09:03:25 -07:00 |
|
Cyrus Leung
|
43b5f61dce
|
[Doc] Move input-related docs to Features (#18353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-19 15:08:39 +00:00 |
|
Li Wang
|
c5bb0ebdc6
|
[Doc] Fix prompt embedding examples (#18350)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-05-19 06:48:16 -07:00 |
|
Shaoyu Yang
|
d637b96099
|
[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS (#18319)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com>
Co-authored-by: cascade <cascade812@outlook.com>
|
2025-05-19 01:31:23 -07:00 |
|
CYJiang
|
275c5daeb0
|
fix: Add type specifications for CLI arguments in tensorizer options (#18314)
|
2025-05-18 23:42:17 -07:00 |
|
Simon Mo
|
47fda6d089
|
[Build] Supports CUDA 12.6 and 11.8 after Blackwell Update (#18316)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-05-18 23:19:33 -07:00 |
|