xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 10:37:13 +08:00

Author	SHA1	Message	Date
Robert Shaw	10585e035e	Removed Extraneous Print Message From OAI Server (#3440 )	2024-03-16 00:35:36 +00:00
Antoni Baum	fb96c1e98c	Asynchronous tokenization (#2879 )	2024-03-15 23:37:01 +00:00
Tao He	14b8ae02e7	Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220 ) Signed-off-by: Tao He <sighingnow@gmail.com> Co-authored-by: simon-mo <simon.mo@hey.com>	2024-03-15 18:25:43 +00:00
Dan Clark	03d37f2441	[Fix] Add args for mTLS support (#3430 ) Co-authored-by: declark1 <daniel.clark@ibm.com>	2024-03-15 09:56:13 -07:00
Yang Fan	a7c871680e	Fix tie_word_embeddings for Qwen2. (#3344 )	2024-03-15 09:36:53 -07:00
Junda Chen	429284dc37	Fix `dist.broadcast` stall without group argument (#3408 )	2024-03-14 23:25:05 -07:00
youkaichao	b522c4476f	[Misc] add HOST_IP env var (#3419 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-14 21:32:52 -07:00
Enrique Shockwave	b983ba35bd	fix marlin config repr (#3414 )	2024-03-14 16:26:19 -07:00
陈序	54be8a0be2	Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373 ) Co-authored-by: Cade Daniel <edacih@gmail.com>	2024-03-14 13:56:57 -07:00
Dan Clark	c17ca8ef18	Add args for mTLS support (#3410 ) Co-authored-by: Daniel Clark <daniel.clark@ibm.com>	2024-03-14 13:11:45 -07:00
youkaichao	8fe8386591	[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )	2024-03-14 08:11:48 +00:00
Zhuohan Li	eeab52a4ff	[FIX] Simpler fix for async engine running on ray (#3371 )	2024-03-13 14:18:40 -07:00
Antoni Baum	c33afd89f5	Fix lint (#3388 )	2024-03-13 13:56:49 -07:00
Terry	7e9bd08f60	Add batched RoPE kernel (#3095 )	2024-03-13 13:45:26 -07:00
Hui Liu	ba8dc958a3	[Minor] Fix bias in if to remove ambiguity (#3259 )	2024-03-13 09:16:55 -07:00
Bo-Wen Wang	b167109ba1	[Fix] Fix quantization="gptq" when using Marlin (#3319 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-03-12 22:51:42 -07:00
Woosuk Kwon	602358f8a8	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
Breno Faria	49a3c8662b	Fixes #1556 double free (#3347 )	2024-03-13 00:30:08 +00:00
DAIZHENWEI	654865e21d	Support Mistral Model Inference with transformers-neuronx (#3153 )	2024-03-11 13:19:51 -07:00
Zhuohan Li	4c922709b6	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Nick Hill	4b59f00e91	[Fix] Fix best_of behavior when n=1 (#3298 )	2024-03-10 19:17:46 -07:00
Roy	9e8744a545	[BugFix] Fix get tokenizer when using ray (#3301 )	2024-03-10 19:17:16 -07:00
Cade Daniel	8437bae6ef	[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103 )	2024-03-08 23:32:46 -08:00
Zhuohan Li	f48c6791b7	[FIX] Fix prefix test error on main (#3286 )	2024-03-08 17:16:14 -08:00
Michael Goin	c2c5e0909a	Move model filelocks from `/tmp/` to `~/.cache/vllm/locks/` dir (#3241 )	2024-03-08 13:33:10 -08:00
Woosuk Kwon	1cb0cc2975	[FIX] Make `flash_attn` optional (#3269 )	2024-03-08 10:52:20 -08:00
whyiug	c59e120c55	Feature add lora support for Qwen2 (#3177 )	2024-03-07 21:58:24 -08:00
Nick Hill	d2339d6840	Connect engine healthcheck to openai server (#3260 )	2024-03-07 16:38:12 -08:00
ElizaWszola	b35cc93420	Fix auto prefix bug (#3239 )	2024-03-07 16:37:28 -08:00
jacobthebanana	8cbba4622c	Possible fix for conflict between Automated Prefix Caching (#2762 ) and multi-LoRA support (#1804 ) (#3263 )	2024-03-07 23:03:22 +00:00
Michael Goin	385da2dae2	Measure model memory usage (#3120 )	2024-03-07 11:42:42 -08:00
Woosuk Kwon	2daf23ab0c	Separate attention backends (#3005 )	2024-03-07 01:45:50 -08:00
TechxGenus	d3c04b6a39	Add GPTQ support for Gemma (#3200 )	2024-03-07 08:19:14 +08:00
Chujie Zheng	4cb3b924cd	Add tqdm `dynamic_ncols=True` (#3242 )	2024-03-06 22:41:42 +00:00
Cade Daniel	a33ce60c66	[Testing] Fix core tests (#3224 )	2024-03-06 01:04:23 -08:00
Nick Hill	2efce05dc3	[Fix] Avoid pickling entire LLMEngine for Ray workers (#3207 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-03-06 00:17:20 +00:00
Nick Hill	8999ec3c16	Store `eos_token_id` in `Sequence` for easy access (#3166 )	2024-03-05 15:35:43 -08:00
Hongxia Yang	05af6da8d9	[ROCm] enable cupy in order to enable cudagraph mode for AMD GPUs (#3123 ) Co-authored-by: lcskrishna <lollachaitanya@gmail.com>	2024-03-04 18:14:53 -08:00
Antoni Baum	ff578cae54	Add health check, make async Engine more robust (#3015 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-04 22:01:40 +00:00
Antoni Baum	22de45235c	Push logprob generation to LLMEngine (#3065 ) Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-03-04 19:54:06 +00:00
ttbachyinsda	76e8a70476	[Minor fix] The domain dns.google may cause a socket.gaierror exception (#3176 ) Co-authored-by: guofangze <guofangze@kuaishou.com>	2024-03-04 19:17:12 +00:00
Philipp Moritz	17c3103c56	Make it easy to profile workers with nsight (#3162 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-03-03 16:19:13 -08:00
Zhuohan Li	996d095c54	[FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark (#3158 )	2024-03-03 14:37:18 -08:00
Jason Cox	d65fac2738	Add vLLM version info to logs and openai API server (#3161 )	2024-03-02 21:00:29 -08:00
Sage Moore	ce4f5a29fb	Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-03-02 00:50:01 -08:00
cloudhan	baee28c46c	Reorder kv dtype check to avoid nvcc not found error on AMD platform (#3104 )	2024-03-02 14:34:48 +08:00
Allen.Dou	29e70e3e88	allow user chose log level by --log-level instead of fixed 'info'. (#3109 ) Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-01 23:28:41 +00:00
Woosuk Kwon	82091b864a	Bump up to v0.3.3 (#3129 )	2024-03-01 12:58:06 -08:00
Robert Shaw	c0c2335ce0	Integrate Marlin Kernels for Int4 GPTQ inference (#2497 ) Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>	2024-03-01 12:47:51 -08:00

1 2 3 4 5 ...

506 Commits