Simon Mo
|
db9dfcfa6a
|
[Docs] Add Ollama meetup slides (#15905)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-01 13:58:59 -07:00 |
|
Gerald
|
9ef98d527e
|
[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
Signed-off-by: qscqesze <475517977@qq.com>
Co-authored-by: qingjun <qingjun@minimaxi.com>
Co-authored-by: qscqesze <475517977@qq.com>
|
2025-04-01 16:23:55 -04:00 |
|
yihong
|
93491aefc7
|
[BugFix] make sure socket close (#15875)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-01 13:10:24 -07:00 |
|
Simon Mo
|
7acd539cd7
|
[Docs] update usage stats language (#15898)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-01 12:54:13 -07:00 |
|
Woosuk Kwon
|
e75a6301bd
|
[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-01 12:33:16 -07:00 |
|
Mark McLoughlin
|
a79cc68b3a
|
[V1][Metrics] Initial speculative decoding metrics (#15151)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-01 10:45:04 -07:00 |
|
Roger Wang
|
7e3f7a4ee7
|
[CI] Disable flaky structure decoding test temporarily. (#15892)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-04-01 17:42:34 +00:00 |
|
cloud11665
|
9ec8257914
|
[Model] Add module name prefixes to gemma3 (#15889)
Signed-off-by: Bartholomew Sabat <bartek@recursal.ai>
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai>
|
2025-04-01 10:13:40 -07:00 |
|
Jennifer Zhao
|
38327cf454
|
[Model] Aya Vision (#15441)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-04-01 16:30:43 +00:00 |
|
Jee Jee Li
|
dfa82e2a3d
|
[CI/Build] Clean up LoRA tests (#15867)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-01 16:28:50 +00:00 |
|
bnellnm
|
e59ca942f5
|
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-01 12:07:43 -04:00 |
|
Gregory Shtrasberg
|
a57a3044aa
|
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork (#15820)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-01 08:56:39 -07:00 |
|
Isotr0py
|
4e5a0f6ae2
|
[Misc] Allow using OpenCV as video IO fallback (#15055)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-01 15:55:13 +00:00 |
|
Harry Mellor
|
b63bd14999
|
Reinstate format.sh and make pre-commit installation simpler (#15890)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-01 15:41:30 +00:00 |
|
chaow-amd
|
2041c0e360
|
[Doc] Quark quantization documentation (#15861)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-01 08:32:45 -07:00 |
|
wang.yuqi
|
085cbc4f9f
|
[New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-01 08:32:26 -07:00 |
|
Harry Mellor
|
2b93162fb0
|
Remove format.sh as it's been unsupported >70 days (#15884)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-01 22:27:46 +08:00 |
|
Reid
|
2e45bd29fe
|
[Misc] remove unused script (#15746)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-01 13:58:05 +00:00 |
|
Michael Goin
|
51d7c6a2b2
|
[Model] Support Mistral3 in the HF Transformers format (#15505)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-01 06:10:05 -07:00 |
|
Yang Chen
|
f3aca1ee30
|
setup correct nvcc version with CUDA_HOME (#15725)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-04-01 06:09:40 -07:00 |
|
Rui Qiao
|
8dd41d6bcc
|
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-01 06:07:53 -07:00 |
|
Isotr0py
|
0a298ea418
|
[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser (#15828)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-01 18:17:11 +08:00 |
|
Harry Mellor
|
d330558bab
|
[Docs] Fix small error in link text (#15868)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-01 10:05:14 +00:00 |
|
shangmingc
|
656fd72976
|
[Misc] Fix speculative config repr string (#15860)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-01 02:26:22 -07:00 |
|
Varun Sundar Rabindranath
|
79455cf421
|
[Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-04-01 16:53:56 +08:00 |
|
Wei Zeng
|
30d6a015e0
|
[Feature] specify model in config.yaml (#15798)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-04-01 01:20:06 -07:00 |
|
yihong
|
8af5a5c4e5
|
fix: can not use uv run collect_env close #13888 (#15792)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-01 07:45:49 +00:00 |
|
Chen Zhang
|
3a5f0afcd2
|
[V1] Implement sliding window attention in kv_cache_manager (#14097)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-01 00:33:17 -07:00 |
|
Gregory Shtrasberg
|
c7e63aa4d8
|
[ROCm] Use device name in the warning (#15838)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-01 00:10:48 -07:00 |
|
Lionel Villard
|
4a9ce1784c
|
[sleep mode] clear pytorch cache after sleep (#15248)
Signed-off-by: <villard@us.ibm.com>
|
2025-03-31 22:58:58 -07:00 |
|
Alexander Matveev
|
7e4e709b43
|
[V1] TPU - Fix fused MOE (#15834)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-31 22:58:07 -07:00 |
|
Alexey Kiryushin
|
63d8eabed0
|
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding (#15824)
Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com>
|
2025-03-31 22:57:59 -07:00 |
|
Percy
|
e830b01383
|
[Bugfix] Fix extra comma (#15851)
Signed-off-by: haochengxia <xhc_1007@163.com>
|
2025-03-31 22:57:28 -07:00 |
|
Yan Ma
|
ff6473980d
|
[Bugfix][Model] fix mllama multi-image (#14883)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-31 22:53:37 -07:00 |
|
Kinfey
|
a164aea35d
|
[Frontend] Add Phi-4-mini function calling support (#14886)
Signed-off-by: Kinfey <kinfeylo@microsoft.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-31 22:50:05 -07:00 |
|
Harry Mellor
|
a76f547e11
|
Rename fallback model and refactor supported models section (#15829)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 22:49:41 -07:00 |
|
Ilya Markov
|
b7b7676d67
|
[Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-03-31 22:49:12 -07:00 |
|
Harry Mellor
|
e6e3c55ef2
|
Move dockerfiles into their own directory (#14549)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 13:47:32 -07:00 |
|
Mark McLoughlin
|
f98a4920f9
|
[V1][Core] Remove unused speculative config from scheduler (#15818)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-31 19:15:21 +00:00 |
|
Harry Mellor
|
d4bfc23ef0
|
Fix Transformers backend compatibility check (#15290)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 10:27:07 -07:00 |
|
Alexander Matveev
|
9a2160fa55
|
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-31 13:25:20 -04:00 |
|
yihong
|
2de4118243
|
fix: change GB to GiB in logging close #14979 (#15807)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-31 10:00:50 -07:00 |
|
shangmingc
|
239b7befdd
|
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-31 09:19:35 -07:00 |
|
Cyrus Leung
|
09e974d483
|
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-31 09:01:35 -07:00 |
|
Harry Mellor
|
e5ef4fa99a
|
Upgrade transformers to v4.50.3 (#13905)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 08:59:37 -07:00 |
|
Mrm
|
037bcd942c
|
[Bugfix] Fix missing return value in load_weights method of adapters.py (#15542)
Signed-off-by: noc-turne <2270929247@qq.com>
|
2025-03-31 06:56:42 -07:00 |
|
Alex Brooks
|
c2e7507ad4
|
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-03-31 13:23:53 +00:00 |
|
Naveassaf
|
3aa2b6a637
|
[Model] Update support for NemotronNAS models (#15008)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-03-31 20:35:14 +08:00 |
|
youkaichao
|
555aa21905
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-31 20:22:34 +08:00 |
|
yihong
|
e7ae3bf3d6
|
fix: better install requirement for install in setup.py (#15796)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-31 05:13:32 -07:00 |
|