xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-24 13:25:01 +08:00

Author	SHA1	Message	Date
Robert Shaw	889da130e7	[ Misc ] `fp8-marlin` channelwise via `compressed-tensors` (#6524 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-07-25 09:46:04 -07:00
Alphi	b75e314fff	[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-25 09:42:49 -07:00
Alexander Matveev	0310029a2f	[Bugfix] Fix awq_marlin and gptq_marlin flags (#6745 )	2024-07-24 22:34:11 -07:00
Cody Yu	309aaef825	[Bugfix] Fix decode tokens w. CUDA graph (#6757 )	2024-07-24 22:33:56 -07:00
Alphi	9e169a4c61	[Model] Adding support for MiniCPM-V (#4087 )	2024-07-24 20:59:30 -07:00
Evan Z. Liu	5689e256ba	[Frontend] Represent tokens with identifiable strings (#6626 )	2024-07-25 09:51:00 +08:00
youkaichao	740374d456	[core][distributed] fix zmq hang (#6759 )	2024-07-24 17:37:12 -07:00
Antoni Baum	5448f67635	[Core] Tweaks to model runner/input builder developer APIs (#6712 )	2024-07-24 12:17:12 -07:00
Antoni Baum	0e63494cf3	Add fp8 support to `reshape_and_cache_flash` (#6667 )	2024-07-24 18:36:52 +00:00
Daniele	ee812580f7	[Frontend] split run_server into build_server and run_server (#6740 )	2024-07-24 10:36:04 -07:00
Allen.Dou	40468b13fa	[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686 )	2024-07-24 08:58:42 -07:00
LF Marques	545146349c	Adding f-string to validation error which is missing (#6748 )	2024-07-24 08:55:53 -07:00
liuyhwangyh	f4f8a9d892	[Bugfix]fix modelscope compatible issue (#6730 )	2024-07-24 05:04:46 -07:00
Roger Wang	0a740a11ba	[Bugfix] Fix token padding for chameleon (#6724 )	2024-07-24 01:05:09 -07:00
William Lin	5e8ca973eb	[Bugfix] fix flashinfer cudagraph capture for PP (#6708 )	2024-07-24 01:49:44 +00:00
dongmao zhang	87525fab92	[bitsandbytes]: support read bnb pre-quantized model (#5753 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-23 23:45:09 +00:00
Thomas Parnell	2f808e69ab	[Bugfix] StatLoggers: cache spec decode metrics when they get collected. (#6645 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-23 23:05:05 +00:00
Roger Wang	1bedf210e3	Bump `transformers` version for Llama 3.1 hotfix and patch Chameleon (#6690 )	2024-07-23 13:47:48 -07:00
Travis Johnson	507ef787d8	[Model] Pipeline Parallel Support for DeepSeek v2 (#6519 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-23 12:22:09 -07:00
Yehoshua Cohen	58f53034ad	[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652 )	2024-07-23 11:41:55 -07:00
Michael Goin	0eb0757bef	[Misc] Add ignored layers for `fp8` quantization (#6657 )	2024-07-23 14:04:04 -04:00
Simon Mo	38c4b7e863	Bump version to 0.5.3.post1 (#6696 )	2024-07-23 10:08:59 -07:00
Woosuk Kwon	a112a84aad	[BugFix] Fix RoPE error in Llama 3.1 (#6693 )	2024-07-23 09:46:05 -07:00
Woosuk Kwon	461089a21a	[Bugfix] Fix a log error in chunked prefill (#6694 )	2024-07-23 09:27:58 -07:00
Simon Mo	bb2fc08072	Bump version to v0.5.3 (#6674 )	2024-07-23 00:00:08 -07:00
Simon Mo	3eda4ec780	support ignore patterns in model loader (#6673 )	2024-07-22 23:59:42 -07:00
Roger Wang	22fa2e35cb	[VLM][Model] Support image input for Chameleon (#6633 )	2024-07-22 23:50:48 -07:00
youkaichao	c5201240a4	[misc] only tqdm for first rank (#6672 )	2024-07-22 21:57:27 -07:00
Cyrus Leung	97234be0ec	[Misc] Manage HTTP connections in one place (#6600 )	2024-07-22 21:32:02 -07:00
Michael Goin	9e0b558a09	[Misc] Support FP8 kv cache scales from compressed-tensors (#6528 )	2024-07-23 04:11:50 +00:00
zhaotyer	e519ae097a	add tqdm when loading checkpoint shards (#6569 ) Co-authored-by: tianyi.zhao <tianyi.zhao@transwarp.io> Co-authored-by: youkaichao <youkaichao@126.com>	2024-07-22 20:48:01 -07:00
youkaichao	7c2749a4fd	[misc] add start loading models for users information (#6670 )	2024-07-22 20:08:02 -07:00
Woosuk Kwon	729171ae58	[Misc] Enable chunked prefill by default for long context models (#6666 )	2024-07-22 20:03:13 -07:00
Cheng Li	c5e8330997	[Bugfix] Fix null `modules_to_not_convert` in FBGEMM Fp8 quantization (#6665 )	2024-07-22 19:25:05 -07:00
Cody Yu	e0c15758b8	[Core] Modulize prepare input and attention metadata builder (#6596 )	2024-07-23 00:45:24 +00:00
Woosuk Kwon	bdf5fd1386	[Misc] Remove deprecation warning for beam search (#6659 )	2024-07-23 00:21:58 +00:00
Jiaxin Shan	42c7f66a38	[Core] Support dynamically loading Lora adapter from HuggingFace (#6234 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-22 15:42:40 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Jae-Won Chung	89c1c6a196	[Bugfix] Fix `vocab_size` field access in `llava_next.py` (#6624 )	2024-07-22 05:02:51 +00:00
Woosuk Kwon	42de2cefcb	[Misc] Add a wrapper for torch.inference_mode (#6618 )	2024-07-21 18:43:11 -07:00
Roger Wang	c9eef37f32	[Model] Initial Support for Chameleon (#5770 )	2024-07-21 17:37:51 -07:00
Alexander Matveev	396d92d5e0	[Kernel][Core] Add AWQ support to the Marlin kernel (#6612 )	2024-07-21 19:41:42 -04:00
Isotr0py	25e778aa16	[Model] Refactor and decouple phi3v image embedding (#6621 )	2024-07-21 16:07:58 -07:00
Woosuk Kwon	b6df37f943	[Misc] Remove abused noqa (#6619 )	2024-07-21 23:47:04 +08:00
sroy745	14f91fe67c	[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485 )	2024-07-20 23:58:58 -07:00
Cyrus Leung	d7f4178dd9	[Frontend] Move chat utils (#6602 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-21 08:38:17 +08:00
Robert Shaw	082ecd80d5	[ Bugfix ] Fix AutoFP8 fp8 marlin (#6609 )	2024-07-20 17:25:56 -06:00
Michael Goin	f952bbc8ff	[Misc] Fix input_scale typing in w8a8_utils.py (#6579 )	2024-07-20 23:11:13 +00:00
Robert Shaw	9364f74eee	[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models (#6606 )	2024-07-20 18:50:10 +00:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00

1 2 3 4 5 ...

1294 Commits