910 Commits

Author SHA1 Message Date
Jim Burtoft
2a60c9bd17
[Doc] minor fix to neuron-installation.rst (#3505) 2024-03-19 13:21:35 -07:00
ifsheldon
c614cfee58
Update dockerfile with ModelScope support (#3429) 2024-03-19 10:54:59 -07:00
Nick Hill
7341c77d69
[BugFix] Avoid initializing CUDA too early (#3487) 2024-03-18 23:05:20 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
youkaichao
6a9c583e73
[Core] print error before deadlock (#3459) 2024-03-19 04:06:23 +00:00
Antoni Baum
b37cdce2b1
[Core] Cache some utils (#3474) 2024-03-18 17:14:26 -07:00
Zhuohan Li
b30880a762
[Misc] Update README for the Third vLLM Meetup (#3479) 2024-03-18 15:58:38 -07:00
Antoni Baum
49eedea373
[Core] Zero-copy asdict for InputMetadata (#3475) 2024-03-18 22:56:40 +00:00
bnellnm
9fdf3de346
Cmake based build system (#2830) 2024-03-18 15:38:33 -07:00
Zhuohan Li
c0c17d4896
[Misc] Fix PR Template (#3478) 2024-03-18 15:00:31 -07:00
Robert Shaw
097aa0ea22
[CI/Build] Fix Bad Import In Test (#3473) 2024-03-18 20:28:00 +00:00
Cade Daniel
482b0adf1b
[Testing] Add test_config.py to CI (#3437) 2024-03-18 12:48:45 -07:00
Simon Mo
8c654c045f
CI: Add ROCm Docker Build (#2886) 2024-03-18 19:33:47 +00:00
Woosuk Kwon
9101d832e6
[Bugfix] Make moe_align_block_size AMD-compatible (#3470) 2024-03-18 11:26:24 -07:00
Simon Mo
93348d9458
[CI] Shard tests for LoRA and Kernels to speed up (#3445) 2024-03-17 14:56:30 -07:00
Woosuk Kwon
abfc4f3387
[Misc] Use dataclass for InputMetadata (#3452)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-03-17 10:02:46 +00:00
Simon Mo
6b78837b29
Fix setup.py neuron-ls issue (#2671) 2024-03-16 16:00:25 -07:00
Simon Mo
120157fd2a
Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) 2024-03-16 13:35:27 -07:00
Simon Mo
8e67598aa6
[Misc] fix line length for entire codebase (#3444) 2024-03-16 00:36:29 -07:00
simon-mo
ad50bf4b25 fix lint 2024-03-15 22:23:38 -07:00
Dinghow Yang
cf6ff18246
Fix Baichuan chat template (#3340) 2024-03-15 21:02:12 -07:00
Ronen Schaffer
14e3f9a1b2
Replace lstrip() with removeprefix() to fix Ruff linter warning (#2958) 2024-03-15 21:01:30 -07:00
Tao He
3123f15138
Fixes the incorrect argument in the prefix-prefill test cases (#3246) 2024-03-15 20:58:10 -07:00
youkaichao
413366e9a2
[Misc] PR templates (#3413)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-15 18:25:51 -07:00
Robert Shaw
10585e035e
Removed Extraneous Print Message From OAI Server (#3440) 2024-03-16 00:35:36 +00:00
Antoni Baum
fb96c1e98c
Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
laneeee
8fa7357f2d
fix document error for value and v_vec illustration (#3421) 2024-03-15 16:06:09 -07:00
Harry Mellor
a7af4538ca
Fix issue templates (#3436) 2024-03-15 21:26:00 +00:00
youkaichao
604f235937
[Misc] add error message in non linux platform (#3438) 2024-03-15 21:21:37 +00:00
Tao He
14b8ae02e7
Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220)
Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-03-15 18:25:43 +00:00
Dan Clark
03d37f2441
[Fix] Add args for mTLS support (#3430)
Co-authored-by: declark1 <daniel.clark@ibm.com>
2024-03-15 09:56:13 -07:00
Yang Fan
a7c871680e
Fix tie_word_embeddings for Qwen2. (#3344) 2024-03-15 09:36:53 -07:00
Junda Chen
429284dc37
Fix dist.broadcast stall without group argument (#3408) 2024-03-14 23:25:05 -07:00
Dinghow Yang
253a98078a
Add chat templates for ChatGLM (#3418) 2024-03-14 23:19:22 -07:00
Dinghow Yang
21539e6856
Add chat templates for Falcon (#3420) 2024-03-14 23:19:02 -07:00
youkaichao
b522c4476f
[Misc] add HOST_IP env var (#3419)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-14 21:32:52 -07:00
akhoroshev
78b6c4845a
Dynamically configure shared memory size for moe_align_block_size_kernel (#3376) 2024-03-14 18:18:07 -07:00
Enrique Shockwave
b983ba35bd
fix marlin config repr (#3414) 2024-03-14 16:26:19 -07:00
陈序
54be8a0be2
Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373)
Co-authored-by: Cade Daniel <edacih@gmail.com>
2024-03-14 13:56:57 -07:00
youkaichao
dfc77408bd
[issue templates] add some issue templates (#3412) 2024-03-14 13:16:00 -07:00
Dan Clark
c17ca8ef18
Add args for mTLS support (#3410)
Co-authored-by: Daniel Clark <daniel.clark@ibm.com>
2024-03-14 13:11:45 -07:00
Thomas Parnell
06ec486794
Install flash_attn in Docker image (#3396) 2024-03-14 10:55:54 -07:00
youkaichao
8fe8386591
[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389) 2024-03-14 08:11:48 +00:00
Allen.Dou
a37415c31b
allow user to chose which vllm's merics to display in grafana (#3393) 2024-03-14 06:35:13 +00:00
Simon Mo
81653d9688
[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383) 2024-03-13 17:02:21 -07:00
Zhuohan Li
eeab52a4ff
[FIX] Simpler fix for async engine running on ray (#3371) 2024-03-13 14:18:40 -07:00
Antoni Baum
c33afd89f5
Fix lint (#3388) 2024-03-13 13:56:49 -07:00
Terry
7e9bd08f60
Add batched RoPE kernel (#3095) 2024-03-13 13:45:26 -07:00
Or Sharir
ae0ccb4017
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350) 2024-03-13 12:18:25 -07:00
陈序
739c350c19
[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build (#3256) 2024-03-13 09:43:24 -07:00