41 Commits

Author SHA1 Message Date
Roger Wang
ccb97338af
[Misc] Add Codex settings to gitignore (#24493)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-09 01:25:44 -07:00
Ye (Charlotte) Qi
45c9cb5835
[Misc] Add claude settings to gitignore (#24492)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-09 01:14:45 -07:00
Jee Jee Li
48f4636927
[Misc] Ignore ep_kernels_workspace (#22807)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-15 05:58:03 -07:00
Harry Mellor
767e63b860
[Docs] Improve docs navigation (#22720)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 04:25:55 -07:00
Yongye Zhu
e789cad6b8
[gpt-oss] triton kernel mxfp4 (#22421)
Signed-off-by: <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-08 08:24:07 -07:00
Harry Mellor
3482fd7e4e
[Doc] Add engine args back in to the docs (#20674)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-10 08:02:40 -07:00
Ning Xie
2f1c19b245
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-11 19:57:10 -07:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Harry Mellor
a1fe24d961
Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 02:09:53 -07:00
Harry Mellor
d6484ef3c3
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-03 19:42:43 -07:00
Lucas Wilkinson
d8bccde686
[BugFix] Fix vllm_flash_attn install issues (#17267)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-04-27 17:27:56 -07:00
Christian Heimes
65e262b93b
Fix Python packaging edge cases (#17159)
Signed-off-by: Christian Heimes <christian@python.org>
2025-04-26 06:15:07 +08:00
DefTruth
a6481525b8
[misc] ignore marlin_moe_wna16 local gen codes (#16760)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-17 17:15:14 +08:00
Lucas Wilkinson
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-23 15:07:04 -07:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
Harry Mellor
5950f555a1
[Doc] Group examples into categories (#11782)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 09:20:12 +08:00
Rafael Vasquez
32aa2059ad
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-12-23 22:35:38 +00:00
Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Russell Bryant
e0dbdb013d
[CI/Build] Add linting for github actions workflows (#7876)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-07 21:18:10 +00:00
Tyler Michael Smith
2e7fe7e79f
[Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930) 2024-09-29 03:13:01 +00:00
Daniele
2467b642dd
[CI/Build] fix setuptools-scm usage (#8771) 2024-09-24 12:38:12 -07:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source (#8245) 2024-09-20 23:27:10 -07:00
Lucas Wilkinson
5288c06aa0
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174) 2024-08-20 07:09:33 -06:00
Michael Goin
855866caa9
[Kernel] Add tuned triton configs for ExpertsInt8 (#7601) 2024-08-16 11:37:01 -07:00
Michael Goin
111fc6e7ec
[Misc] Add generated git commit hash as vllm.__commit__ (#6386) 2024-07-12 22:52:15 +00:00
Harry Mellor
3d925165f2
Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Woosuk Kwon
1cb0cc2975
[FIX] Make flash_attn optional (#3269) 2024-03-08 10:52:20 -08:00
Woosuk Kwon
2daf23ab0c
Separate attention backends (#3005) 2024-03-07 01:45:50 -08:00
Harry Mellor
2709c0009a
Support OpenAI API server in benchmark_serving.py (#2172) 2024-01-18 20:34:08 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Zhuohan Li
2cf1a333b6
[Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00
Zhuohan Li
a255885f83
Add logo and polish readme (#156) 2023-06-19 16:31:13 +08:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00
Woosuk Kwon
19d2899439
Add initial sphinx docs (#120) 2023-05-22 17:02:44 -07:00
Zhuohan Li
4858f3bb45
Add an option to launch cacheflow without ray (#51) 2023-04-30 15:42:17 +08:00
Woosuk Kwon
84eee24e20
Collect system stats in scheduler & Add scripts for experiments (#30) 2023-04-12 15:03:49 -07:00
Woosuk Kwon
3b41f16596 Add gitignore 2023-02-16 07:47:21 +00:00
Woosuk Kwon
0a11a2e5ca Add gitignore 2023-02-09 11:28:12 +00:00