Woosuk Kwon
31060b2757
[V1][BugFix] Detect interleaved sliding window attention ( #14896 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-16 14:53:53 -07:00
Nick Hill
fc1f67715d
[BugFix][V1] Fix overhead related to bad_words sampling when not in use ( #14894 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-16 14:53:34 -07:00
Cyrus Leung
f6137adbcb
Revert "[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 ) ( #14892 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-16 09:13:46 -07:00
Cyrus Leung
e53b1350f2
[Bugfix] Explicitly disable Phi-4-multimodal in V1 ( #14889 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-16 09:05:40 -07:00
Kyle Sayers
d30aa7e9e6
[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-03-16 07:44:19 -07:00
Lily Liu
d1ad2a57af
[V1] [Spec Decode] Fix ngram tests ( #14878 )
2025-03-16 00:29:22 -07:00
Nick Hill
b82662d952
[BugFix] Fix torch distributed stateless PG backend init ( #14870 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-15 20:26:19 -07:00
Simon Mo
71c1e07107
[Kernel] Add more tuned configs ( #14877 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-15 20:25:03 -07:00
Roger Wang
b30c75dda4
[V1] Remove V0 fallback for mistral-tokenizer ( #14873 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-15 20:21:11 -07:00
Isotr0py
def232e122
[VLM] Clean up Phi-4-MM ViT implementation ( #14812 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-15 18:53:52 -07:00
Roger Wang
3453b964a3
[Misc][Doc] Minor benchmark README update ( #14874 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-16 09:46:17 +08:00
Rémi Delacourt
61c6a5a796
[VLM] Merged multi-modal processor for Pixtral ( #12211 )
...
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-15 06:28:27 -07:00
Jun Duan
74bc397b0a
[Core] Expose API endpoint /is_sleeping ( #14312 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-03-15 06:28:14 -07:00
Kunshang Ji
f58aea002c
[CI][Intel GPU] refine intel GPU ci docker build ( #14860 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-03-15 11:58:53 +00:00
Cyrus Leung
3556a41434
[VLM] Limit multimodal input cache by memory ( #14805 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-15 02:52:05 -07:00
Bryan Lu
9ed6ee92d6
[Bugfix] EAGLE output norm bug ( #14464 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-03-15 06:50:33 +00:00
Russell Bryant
ee3778d5fc
[Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes ( #14839 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-15 05:38:19 +00:00
Jennifer Zhao
aaacf17324
[Doc] V1 user guide ( #13991 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-14 22:17:59 -07:00
Aaron Pham
4c7629cae9
[V1][Structured Output] calculate vocab_size eagerly ( #14851 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-14 22:09:51 -07:00
Jee Jee Li
e0fdfa1608
[CI/Build] Delete LoRA bias test ( #14849 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-14 22:09:25 -07:00
Lucas Wilkinson
5952d8ab61
[Attention] Get rid of mla cache alignment ( #14842 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-15 05:08:25 +00:00
Li, Jiang
a2ae496589
[CPU] Support FP8 KV cache ( #14741 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Simon Mo
877e352262
[Docs] Add new East Coast vLLM Meetup slides to README and meetups.md ( #14852 )
2025-03-14 22:06:38 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
Lu Fang
8c0d15d5c5
[Misc][Easy] Annotate unused vars in the csrc files ( #14798 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-15 12:40:09 +08:00
Isotr0py
97ac781c62
[Misc] Remove misleading message in gemma2 and gemma3 ( #14850 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-14 21:35:12 -07:00
Russell Bryant
776dcec8fe
Disable outlines cache by default ( #14837 )
2025-03-15 03:57:55 +00:00
Tyler Michael Smith
ccf02fcbae
Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… ( #14848 )
2025-03-14 20:45:42 -07:00
DefTruth
acaea3bb07
[Bugfix][V1] Fix flashinfer sampling ( #14815 )
2025-03-14 20:42:38 -07:00
Liangfu Chen
9f37422779
[Neuron][CI] update docker run command ( #14829 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
2025-03-14 18:51:35 -07:00
yarongmu-google
dd344e0342
[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … ( #14844 )
...
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-15 00:41:15 +00:00
Yuan Tang
54a8804455
[Doc] More neutral K8s deployment guide ( #14084 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-14 16:12:36 -07:00
Russell Bryant
bbd94a19fc
[Build/CI] Upgrade aiohttp to incldue CVE fix ( #14840 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 23:11:28 +00:00
Russell Bryant
233ffce1eb
[Build/CI] Move ninja to common deps ( #14835 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 21:25:28 +00:00
Richard Liu
40677783aa
[CI] Add TPU v1 test ( #14834 )
...
Signed-off-by: Richard Liu <ricliu@google.com>
2025-03-14 17:13:30 -04:00
Michael Goin
14f301b541
Update to torch==2.6.0 ( #12721 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-14 16:58:30 -04:00
Russell Bryant
46f98893dd
[V1] Fix model parameterization for structured output tests ( #14833 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 20:55:18 +00:00
Chih-Chieh Yang
fe66b34728
[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14778 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-03-14 16:36:18 -04:00
Alexei-V-Ivanov-AMD
270a5da495
Re-enable the AMD Entrypoints Test ( #14711 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-03-14 12:18:13 -07:00
Kevin H. Luu
7097b4cc1c
[release] Remove log cleanup commands from TPU job ( #14838 )
2025-03-14 11:59:52 -07:00
Yajie Wang
977a16772c
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 ( #14430 )
...
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>
2025-03-14 09:55:14 -07:00
daniel-salib
73deea2fdb
[Frontend] track server_load ( #13950 )
2025-03-14 09:53:17 -07:00
Mark McLoughlin
9d2b4a70f4
[V1][Metrics] Updated list of deprecated metrics in v0.8 ( #14695 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-15 00:45:25 +08:00
Russell Bryant
0b0d6421b2
[Frontend] Fix log message to use http vs https ( #14774 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 09:21:09 -07:00
Russell Bryant
1140991a7b
[V1] Fix vocab size calculation for structured output ( #14826 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 09:18:38 -07:00
Cyrus Leung
613c5bb945
[Bugfix] Fix Aria test loading ( #14823 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 09:11:23 -07:00
Guillaume Calmettes
fd8e055ffb
[BugFix]: properly catch templating error when preprocess input ( #13976 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-03-14 05:58:34 -07:00
Cyrus Leung
ab93f1360f
[VLM] Various cleanup and fixes ( #14806 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 05:58:19 -07:00
DefTruth
40253bab44
[Bugfix][W8A8] fixed cutlass block fp8 binding ( #14796 )
2025-03-14 03:32:42 -07:00
Woosuk Kwon
c77620d22d
[V1][Minor] Minor code cleanup for scheduling metrics ( #14800 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-14 08:21:28 +00:00