6645 Commits

Author SHA1 Message Date
wangxiyuan
721fb9b181
[Platform] Move platform check to right place (#18470)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-22 12:11:28 -07:00
David Xia
1f3a1200e4
[Bugfix] make test_openai_schema.py pass (#18224)
Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-22 18:34:06 +00:00
Lukas Geiger
54631f8262
[Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() (#18347)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-05-22 09:00:13 -07:00
Reid
cb506ecb5a
[Misc] improve Automatic Prefix Caching example (#18554)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-22 14:50:46 +00:00
Li, Jiang
93f71673ce
[BugFix][CPU] Fix x86 SHM distributed module initialization (#18536)
Signed-off-by: jiang.li <jiang1.li@intel.com>
2025-05-22 07:35:00 -07:00
Calvin Chen
3f505233fd
[Doc] Add stream flag for chat completion example (#18524)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-22 14:07:10 +00:00
Bowen Wang
4e04eceb58
[Bugfix] Use random hidden states in dummy sampler run (#18543)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-05-22 06:48:56 -07:00
CYJiang
71075029f2
[Doc] Support --stream arg in openai_completion_client.py script (#18388)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-05-22 13:20:17 +00:00
Harry Mellor
ca86a7cf6e
[CI/Build] Update bamba test model location (#18544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-22 06:01:07 -07:00
lkchen
a35a494745
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513)
Signed-off-by: Linkun <github@lkchen.net>
2025-05-22 05:24:43 -07:00
f6037d1907
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18526)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-22 05:22:53 -07:00
aws-elaineyz
fa72f9a812
Order sequence ids + config update to support specifying custom quantization layers (#18279)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Tailin Pan <tailinpa@amazon.com>
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Maxwell Goldberg <mgld@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>
2025-05-22 02:20:36 -07:00
aws-elaineyz
ebed81fbf5
Update default neuron config for speculation (#18274)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>
2025-05-22 02:18:55 -07:00
Satyajith Chilappagari
e2d7d31244
[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (#18512)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
2025-05-22 02:17:34 -07:00
Cyrus Leung
23b67b37b2
[Doc] Fix invalid JSON in example args (#18527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-22 07:11:46 +00:00
Jee Jee Li
db5a29ba19
[Bugfix] Fix LoRA test (#18518)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-21 21:48:53 -07:00
Shane A
51797775c3
[Bugfix][Model] Make Olmo2Model weight loading return loaded weights (#18504)
Signed-off-by: Shane A <shanea@allenai.org>
2025-05-21 21:17:03 -07:00
Nick Hill
cf5984b2fe
[BugFix][DP] Send DP wave completion only from dp_rank==0 (#18502)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com>
2025-05-21 20:25:25 -07:00
youngrok cha
d022115cc6
[Bugfix] Inconsistent token calculation compared to HF in llava family (#18479)
Signed-off-by: jaycha <jaycha@ncsoft.com>
2025-05-21 20:21:47 -07:00
Rabi Mishra
acb54ca8e1
Intialize io_thread_pool attribute in the beginning. (#18331)
Signed-off-by: rabi <ramishra@redhat.com>
2025-05-21 20:21:14 -07:00
Russell Bryant
6e0fd34d3c
[CI] Fix race condition with StatelessProcessGroup.barrier (#18506)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-05-21 20:19:13 -07:00
Ning Xie
176d62e4ea
[MISC] update project urls in pyproject.toml (#18519)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-21 20:17:34 -07:00
Dhia Eddine Rhaiem
20bd6f4d2e
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
2025-05-21 19:23:59 -07:00
Sebastian Schoennenbeck
1f079540db
[Bugfix] Consistent ascii handling in tool parsers (#17704)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
2025-05-21 20:41:23 +00:00
vllmellm
94d8ec8d2b
[FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-05-21 10:34:28 -07:00
Mark McLoughlin
bb0a311213
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-21 10:25:23 -07:00
Hosang
dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-05-21 08:35:00 -07:00
Hyogeun Oh (오효근)
2b16104557
[Misc] Update deprecation message for --enable-reasoning (#18404) 2025-05-21 07:33:11 -07:00
Kebe
371376f996
[Build] fix Dockerfile shell (#18402) 2025-05-21 07:32:06 -07:00
bnellnm
c6c10ca920
[Bugfix] Reduce moe_sum test size to avoid OOM (#18484)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-21 06:46:39 -07:00
GiantCroc
c154d89306
[Doc] fix arg docstring in linear layers (#18410)
Signed-off-by: giantcroc <1204449533@qq.com>
2025-05-21 06:45:57 -07:00
Dhia Eddine Rhaiem
eca18691d2
[MODEL] FalconH1 (#18406)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
2025-05-21 04:59:06 -07:00
Rabi Mishra
61acfc45bc
[Bugfix][Failing Test] Fix test_events.py (#18460)
Signed-off-by: rabi <ramishra@redhat.com>
2025-05-21 04:57:28 -07:00
Reid
107f5fc4cb
[Misc] refactor disaggregated-prefill-v1 example (#18474)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-21 11:10:14 +00:00
Yong Hoon Shin
907f935de9
[V1] Fix general plugins not loaded in engine for multiproc (#18326)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-05-21 01:21:49 -07:00
Kebe
5d7f545204
[Frontend] deprecate --device arg (#18399)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-05-21 01:21:17 -07:00
Nicolò Lucchesi
cd8dfc6dfc
[Misc] MultiConnector._connectors type (#18423)
Signed-off-by: nicklucche <nlucches@redhat.com>
2025-05-20 22:48:43 -07:00
wwl2755
d06dd72ba9
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size (#18429)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-05-20 22:41:44 -07:00
Cyrus Leung
ad0012a0ac
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)" (#18456)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-20 22:39:22 -07:00
bnellnm
92247c522e
[Bug] Fix moe_sum signature (#18440)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-05-20 22:37:08 -07:00
Gregory Shtrasberg
0c15c2e486
[Bugfix] config.head_dim is now explicitly set to None (#18432)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-20 21:04:33 -07:00
Michael Goin
3b17ea26e4
[TPU] Re-enable the Pallas MoE kernel (#18025)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-05-20 19:52:27 -07:00
Dilip Gowda Bhagavan
23baa2180b
fix:Build torch wheel inline rather than picking from nightly (#18351)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
2025-05-20 22:22:24 +00:00
Percy
980a172474
[Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-05-20 11:19:34 -07:00
Calvin Chen
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-20 09:40:05 -07:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
Reid
8f55962a7f
[Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 15:26:12 +00:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
2025-05-20 06:59:48 -07:00
wang.yuqi
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model (#17175) 2025-05-20 06:51:12 -07:00
汪志鹏
d6c86d09ae
Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-05-20 10:53:23 +00:00