5568 Commits

Author SHA1 Message Date
Simon Mo
db9dfcfa6a
[Docs] Add Ollama meetup slides (#15905)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-04-01 13:58:59 -07:00
Gerald
9ef98d527e
[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
Signed-off-by: qscqesze <475517977@qq.com>
Co-authored-by: qingjun <qingjun@minimaxi.com>
Co-authored-by: qscqesze <475517977@qq.com>
2025-04-01 16:23:55 -04:00
yihong
93491aefc7
[BugFix] make sure socket close (#15875)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-01 13:10:24 -07:00
Simon Mo
7acd539cd7
[Docs] update usage stats language (#15898)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-04-01 12:54:13 -07:00
Woosuk Kwon
e75a6301bd
[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-01 12:33:16 -07:00
Mark McLoughlin
a79cc68b3a
[V1][Metrics] Initial speculative decoding metrics (#15151)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-01 10:45:04 -07:00
Roger Wang
7e3f7a4ee7
[CI] Disable flaky structure decoding test temporarily. (#15892)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-04-01 17:42:34 +00:00
cloud11665
9ec8257914
[Model] Add module name prefixes to gemma3 (#15889)
Signed-off-by: Bartholomew Sabat <bartek@recursal.ai>
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai>
2025-04-01 10:13:40 -07:00
Jennifer Zhao
38327cf454
[Model] Aya Vision (#15441)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-01 16:30:43 +00:00
Jee Jee Li
dfa82e2a3d
[CI/Build] Clean up LoRA tests (#15867)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-01 16:28:50 +00:00
bnellnm
e59ca942f5
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-01 12:07:43 -04:00
Gregory Shtrasberg
a57a3044aa
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork (#15820)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-01 08:56:39 -07:00
Isotr0py
4e5a0f6ae2
[Misc] Allow using OpenCV as video IO fallback (#15055)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-01 15:55:13 +00:00
Harry Mellor
b63bd14999
Reinstate format.sh and make pre-commit installation simpler (#15890)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-01 15:41:30 +00:00
chaow-amd
2041c0e360
[Doc] Quark quantization documentation (#15861)
Signed-off-by: chaow <chaow@amd.com>
2025-04-01 08:32:45 -07:00
wang.yuqi
085cbc4f9f
[New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-01 08:32:26 -07:00
Harry Mellor
2b93162fb0
Remove format.sh as it's been unsupported >70 days (#15884)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-01 22:27:46 +08:00
Reid
2e45bd29fe
[Misc] remove unused script (#15746)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-01 13:58:05 +00:00
Michael Goin
51d7c6a2b2
[Model] Support Mistral3 in the HF Transformers format (#15505)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-01 06:10:05 -07:00
Yang Chen
f3aca1ee30
setup correct nvcc version with CUDA_HOME (#15725)
Signed-off-by: Yang Chen <yangche@fb.com>
2025-04-01 06:09:40 -07:00
Rui Qiao
8dd41d6bcc
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-01 06:07:53 -07:00
Isotr0py
0a298ea418
[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser (#15828)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-01 18:17:11 +08:00
Harry Mellor
d330558bab
[Docs] Fix small error in link text (#15868)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-01 10:05:14 +00:00
shangmingc
656fd72976
[Misc] Fix speculative config repr string (#15860)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-01 02:26:22 -07:00
Varun Sundar Rabindranath
79455cf421
[Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-01 16:53:56 +08:00
Wei Zeng
30d6a015e0
[Feature] specify model in config.yaml (#15798)
Signed-off-by: weizeng <weizeng@roblox.com>
2025-04-01 01:20:06 -07:00
yihong
8af5a5c4e5
fix: can not use uv run collect_env close #13888 (#15792)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-01 07:45:49 +00:00
Chen Zhang
3a5f0afcd2
[V1] Implement sliding window attention in kv_cache_manager (#14097)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-04-01 00:33:17 -07:00
Gregory Shtrasberg
c7e63aa4d8
[ROCm] Use device name in the warning (#15838)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-01 00:10:48 -07:00
Lionel Villard
4a9ce1784c
[sleep mode] clear pytorch cache after sleep (#15248)
Signed-off-by: <villard@us.ibm.com>
2025-03-31 22:58:58 -07:00
Alexander Matveev
7e4e709b43
[V1] TPU - Fix fused MOE (#15834)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 22:58:07 -07:00
Alexey Kiryushin
63d8eabed0
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding (#15824)
Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com>
2025-03-31 22:57:59 -07:00
Percy
e830b01383
[Bugfix] Fix extra comma (#15851)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-03-31 22:57:28 -07:00
Yan Ma
ff6473980d
[Bugfix][Model] fix mllama multi-image (#14883)
Signed-off-by: yan ma <yan.ma@intel.com>
2025-03-31 22:53:37 -07:00
Kinfey
a164aea35d
[Frontend] Add Phi-4-mini function calling support (#14886)
Signed-off-by: Kinfey <kinfeylo@microsoft.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-31 22:50:05 -07:00
Harry Mellor
a76f547e11
Rename fallback model and refactor supported models section (#15829)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-31 22:49:41 -07:00
Ilya Markov
b7b7676d67
[Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-03-31 22:49:12 -07:00
Harry Mellor
e6e3c55ef2
Move dockerfiles into their own directory (#14549)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-31 13:47:32 -07:00
Mark McLoughlin
f98a4920f9
[V1][Core] Remove unused speculative config from scheduler (#15818)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-31 19:15:21 +00:00
Harry Mellor
d4bfc23ef0
Fix Transformers backend compatibility check (#15290)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-31 10:27:07 -07:00
Alexander Matveev
9a2160fa55
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
yihong
2de4118243
fix: change GB to GiB in logging close #14979 (#15807)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-03-31 10:00:50 -07:00
shangmingc
239b7befdd
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-31 09:19:35 -07:00
Cyrus Leung
09e974d483
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-31 09:01:35 -07:00
Harry Mellor
e5ef4fa99a
Upgrade transformers to v4.50.3 (#13905)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-31 08:59:37 -07:00
Mrm
037bcd942c
[Bugfix] Fix missing return value in load_weights method of adapters.py (#15542)
Signed-off-by: noc-turne <2270929247@qq.com>
2025-03-31 06:56:42 -07:00
Alex Brooks
c2e7507ad4
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-03-31 13:23:53 +00:00
Naveassaf
3aa2b6a637
[Model] Update support for NemotronNAS models (#15008)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-03-31 20:35:14 +08:00
youkaichao
555aa21905
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-31 20:22:34 +08:00
yihong
e7ae3bf3d6
fix: better install requirement for install in setup.py (#15796)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-03-31 05:13:32 -07:00