Tristan Leclercq
|
5eeabc2a44
|
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950)
|
2025-03-17 23:27:26 +00:00 |
|
Alexander Matveev
|
18551e820c
|
[V1] TPU - Fix CI/CD runner (#14974)
|
2025-03-17 21:07:07 +00:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Cyrus Leung
|
b89fb2a4a1
|
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 18:35:17 +00:00 |
|
Roger Wang
|
5340b0e221
|
[Bugfix] Fix interface for Olmo2 on V1 (#14976)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-17 11:26:38 -07:00 |
|
Roger Wang
|
37e3806132
|
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Signed-off-by: Roger Wang <ywang@roblox.com>
v0.8.0rc2
|
2025-03-17 10:04:21 -07:00 |
|
Aaron Pham
|
c0efdd655b
|
[Fix][Structured Output] using vocab_size to construct matcher (#14868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-03-17 11:42:45 -04:00 |
|
Quentin
|
aaaec52ad9
|
[Bugfix][Model] Mixtral: use unused head_dim config argument (#14961)
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai>
|
2025-03-17 07:44:18 -07:00 |
|
Tyler Michael Smith
|
e1eb45d397
|
[Bugfix] Fix precommit - line too long in pixtral.py (#14960)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 07:18:50 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Patrick von Platen
|
d20b0c139c
|
Add patch merger (#14957)
|
2025-03-17 06:47:50 -07:00 |
|
Cyrus Leung
|
166a168b0f
|
[Doc] Fix misleading log during multi-modal profiling (#14955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 06:14:32 -07:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Cyrus Leung
|
6eaf1e5c52
|
[Misc] Add --seed option to offline multi-modal examples (#14934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 03:00:17 -07:00 |
|
Cyrus Leung
|
868a8c5b2c
|
[Bugfix] Fix Ultravox on V1 (#14929)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 17:15:20 +08:00 |
|
iefgnoix
|
b4ad56c1bd
|
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-17 01:48:28 -07:00 |
|
kushanam
|
69698f257e
|
fix minor miscalled method (#14327)
|
2025-03-17 01:47:58 -07:00 |
|
Lu Fang
|
cd0cd85102
|
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-17 16:40:41 +08:00 |
|
Russell Bryant
|
0a74bfce9c
|
setup.py: drop assumption about local main branch (#14692)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-17 01:37:42 -07:00 |
|
Chen Zhang
|
dd3b865854
|
[Doc] Add vLLM Beijing meetup slide (#14938)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-17 16:29:36 +08:00 |
|
Yan Ma
|
9b87a579aa
|
[Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-17 01:22:14 -07:00 |
|
Cyrus Leung
|
b539222d4e
|
[V1] Remove input cache client (#14864)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 23:42:06 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.0rc1
|
2025-03-16 22:00:20 -07:00 |
|
Simon Mo
|
583a9778e0
|
[Benchmark] Do not save detailed info to json by default (#14879)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-16 21:48:11 -07:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Lucas Wilkinson
|
1e799b7ec1
|
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910)
|
2025-03-17 03:35:37 +00:00 |
|
Woosuk Kwon
|
7f6c5ee06c
|
[V1][Minor] Add __repr__ to ConstantList (#14907)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:20:15 -07:00 |
|
Woosuk Kwon
|
faa0275730
|
[V1] Optimize the overhead of rewinding (#14905)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:19:30 -07:00 |
|
Cyrus Leung
|
8a5a9b70d7
|
[CI/Build] Update defaults for test reproducibility (#14893)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 10:38:15 +08:00 |
|
Robert Shaw
|
bb3aeddfaf
|
[CI] Nightly Tests (#14898)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-17 02:06:43 +00:00 |
|
Robert Shaw
|
aecc780dba
|
[V1] Enable Entrypoints Tests (#14903)
|
2025-03-16 17:56:16 -07:00 |
|
Vadim Gimpelson
|
90df7f23aa
|
[Doc] Add guidance for using ccache with pip install -e . in doc (#14901)
|
2025-03-16 23:10:04 +00:00 |
|
Rui Qiao
|
b9b5bdfc7d
|
[Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847)
|
2025-03-16 15:46:42 -07:00 |
|
Woosuk Kwon
|
31060b2757
|
[V1][BugFix] Detect interleaved sliding window attention (#14896)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 14:53:53 -07:00 |
|
Nick Hill
|
fc1f67715d
|
[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-16 14:53:34 -07:00 |
|
Cyrus Leung
|
f6137adbcb
|
Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785) (#14892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-16 09:13:46 -07:00 |
|
Cyrus Leung
|
e53b1350f2
|
[Bugfix] Explicitly disable Phi-4-multimodal in V1 (#14889)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-16 09:05:40 -07:00 |
|
Kyle Sayers
|
d30aa7e9e6
|
[Bugfix] Limit profiling run sequence length by max_model_len (#14785)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-16 07:44:19 -07:00 |
|
Lily Liu
|
d1ad2a57af
|
[V1] [Spec Decode] Fix ngram tests (#14878)
|
2025-03-16 00:29:22 -07:00 |
|
Nick Hill
|
b82662d952
|
[BugFix] Fix torch distributed stateless PG backend init (#14870)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-15 20:26:19 -07:00 |
|
Simon Mo
|
71c1e07107
|
[Kernel] Add more tuned configs (#14877)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-15 20:25:03 -07:00 |
|
Roger Wang
|
b30c75dda4
|
[V1] Remove V0 fallback for mistral-tokenizer (#14873)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-15 20:21:11 -07:00 |
|
Isotr0py
|
def232e122
|
[VLM] Clean up Phi-4-MM ViT implementation (#14812)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-15 18:53:52 -07:00 |
|
Roger Wang
|
3453b964a3
|
[Misc][Doc] Minor benchmark README update (#14874)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 09:46:17 +08:00 |
|
Rémi Delacourt
|
61c6a5a796
|
[VLM] Merged multi-modal processor for Pixtral (#12211)
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-15 06:28:27 -07:00 |
|
Jun Duan
|
74bc397b0a
|
[Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
|
2025-03-15 06:28:14 -07:00 |
|
Kunshang Ji
|
f58aea002c
|
[CI][Intel GPU] refine intel GPU ci docker build (#14860)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-03-15 11:58:53 +00:00 |
|
Cyrus Leung
|
3556a41434
|
[VLM] Limit multimodal input cache by memory (#14805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-15 02:52:05 -07:00 |
|
Bryan Lu
|
9ed6ee92d6
|
[Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-03-15 06:50:33 +00:00 |
|
Russell Bryant
|
ee3778d5fc
|
[Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes (#14839)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-15 05:38:19 +00:00 |
|