Roger Wang
ccb97338af
[Misc] Add Codex settings to gitignore ( #24493 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-09 01:25:44 -07:00
Ye (Charlotte) Qi
45c9cb5835
[Misc] Add claude settings to gitignore ( #24492 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-09 01:14:45 -07:00
Jee Jee Li
48f4636927
[Misc] Ignore ep_kernels_workspace ( #22807 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-15 05:58:03 -07:00
Harry Mellor
767e63b860
[Docs] Improve docs navigation ( #22720 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 04:25:55 -07:00
Yongye Zhu
e789cad6b8
[gpt-oss] triton kernel mxfp4 ( #22421 )
...
Signed-off-by: <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-08 08:24:07 -07:00
Harry Mellor
3482fd7e4e
[Doc] Add engine args back in to the docs ( #20674 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-10 08:02:40 -07:00
Ning Xie
2f1c19b245
[CI] change spell checker from codespell to typos ( #18711 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-11 19:57:10 -07:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide ( #18666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
Harry Mellor
a1fe24d961
Migrate docs from Sphinx to MkDocs ( #18145 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-23 02:09:53 -07:00
Harry Mellor
d6484ef3c3
Add full API docs and improve the UX of navigating them ( #17485 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-03 19:42:43 -07:00
Lucas Wilkinson
d8bccde686
[BugFix] Fix vllm_flash_attn install issues ( #17267 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-04-27 17:27:56 -07:00
Christian Heimes
65e262b93b
Fix Python packaging edge cases ( #17159 )
...
Signed-off-by: Christian Heimes <christian@python.org>
2025-04-26 06:15:07 +08:00
DefTruth
a6481525b8
[misc] ignore marlin_moe_wna16 local gen codes ( #16760 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-04-17 17:15:14 +08:00
Lucas Wilkinson
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle ( #15191 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-23 15:07:04 -07:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs ( #12388 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
Harry Mellor
5950f555a1
[Doc] Group examples into categories ( #11782 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 09:20:12 +08:00
Rafael Vasquez
32aa2059ad
[Docs] Convert rST to MyST (Markdown) ( #11145 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-12-23 22:35:38 +00:00
Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck ( #7925 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Russell Bryant
e0dbdb013d
[CI/Build] Add linting for github actions workflows ( #7876 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-07 21:18:10 +00:00
Tyler Michael Smith
2e7fe7e79f
[Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching ( #8930 )
2024-09-29 03:13:01 +00:00
Daniele
2467b642dd
[CI/Build] fix setuptools-scm usage ( #8771 )
2024-09-24 12:38:12 -07:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ ( #4738 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source ( #8245 )
2024-09-20 23:27:10 -07:00
Lucas Wilkinson
5288c06aa0
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel ( #7174 )
2024-08-20 07:09:33 -06:00
Michael Goin
855866caa9
[Kernel] Add tuned triton configs for ExpertsInt8 ( #7601 )
2024-08-16 11:37:01 -07:00
Michael Goin
111fc6e7ec
[Misc] Add generated git commit hash as vllm.__commit__ ( #6386 )
2024-07-12 22:52:15 +00:00
Harry Mellor
3d925165f2
Add example scripts to documentation ( #4225 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Woosuk Kwon
1cb0cc2975
[FIX] Make flash_attn optional ( #3269 )
2024-03-08 10:52:20 -08:00
Woosuk Kwon
2daf23ab0c
Separate attention backends ( #3005 )
2024-03-07 01:45:50 -08:00
Harry Mellor
2709c0009a
Support OpenAI API server in benchmark_serving.py ( #2172 )
2024-01-18 20:34:08 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main ( #1836 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA ( #1032 )
...
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Zhuohan Li
2cf1a333b6
[Doc] Documentation for distributed inference ( #261 )
2023-06-26 11:34:23 -07:00
Zhuohan Li
a255885f83
Add logo and polish readme ( #156 )
2023-06-19 16:31:13 +08:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00
Woosuk Kwon
19d2899439
Add initial sphinx docs ( #120 )
2023-05-22 17:02:44 -07:00
Zhuohan Li
4858f3bb45
Add an option to launch cacheflow without ray ( #51 )
2023-04-30 15:42:17 +08:00
Woosuk Kwon
84eee24e20
Collect system stats in scheduler & Add scripts for experiments ( #30 )
2023-04-12 15:03:49 -07:00
Woosuk Kwon
3b41f16596
Add gitignore
2023-02-16 07:47:21 +00:00
Woosuk Kwon
0a11a2e5ca
Add gitignore
2023-02-09 11:28:12 +00:00