xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-17 16:46:38 +08:00

Author	SHA1	Message	Date
Zhuohan Li	e90fc21f2e	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
Roy	f1c0fc3919	Migrate `logits` computation and gather to `model_runner` (#3233 )	2024-03-20 23:25:01 +00:00
SangBin Cho	6e435de766	[1/n][Chunked Prefill] Refactor input query shapes (#3236 )	2024-03-20 14:46:05 -07:00
Antoni Baum	426ec4ec67	[1/n] Triton sampling kernel (#3186 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-03-20 14:45:08 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Cade Daniel	8437bae6ef	[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103 )	2024-03-08 23:32:46 -08:00
Nick Hill	8999ec3c16	Store `eos_token_id` in `Sequence` for easy access (#3166 )	2024-03-05 15:35:43 -08:00
Antoni Baum	22de45235c	Push logprob generation to LLMEngine (#3065 ) Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-03-04 19:54:06 +00:00
Liangfu Chen	3b7178cfa4	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
Nick Hill	7d2dcce175	Support per-request seed (#2514 )	2024-02-21 11:47:00 -08:00
Antoni Baum	9b945daaf1	[Experimental] Add multi-LoRA support (#1804 ) Co-authored-by: Chen Shen <scv119@gmail.com> Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-01-23 15:26:37 -08:00
陈序	218dc2ccda	Aligning `top_p` and `top_k` Sampling (#1885 ) * Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors	2024-01-12 22:51:03 +01:00
Zhuohan Li	fd4ea8ef5c	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
Roy	9140561059	[Minor] Fix typo and remove unused code (#2305 )	2024-01-02 19:23:15 -08:00
Antoni Baum	bd29cf3d3a	Remove Sampler copy stream (#2209 )	2023-12-20 00:04:33 -08:00
Antoni Baum	a7347d9a6d	Make sampler less blocking (#1889 )	2023-12-17 23:03:49 +08:00
Woosuk Kwon	27feead2f8	Refactor Worker & InputMetadata (#1843 )	2023-11-29 22:16:37 -08:00
Zhuohan Li	708e6c18b0	[FIX] Fix class naming (#1803 )	2023-11-28 14:08:01 -08:00
ljss	de23687d16	Fix repetition penalty aligned with huggingface (#1577 )	2023-11-22 14:41:44 -08:00
Roy	e87557b069	Support Min P Sampler (#1642 )	2023-11-17 16:20:49 -08:00
Noam Gat	555bdcc5a3	Added logits processor API to sampling params (#1469 )	2023-11-03 14:12:15 -07:00
Antoni Baum	15f5632365	Delay GPU->CPU sync in sampling (#1337 )	2023-10-30 09:01:34 -07:00
ljss	69be658bba	Support repetition_penalty (#1424 )	2023-10-29 10:02:41 -07:00
Woosuk Kwon	c1376e0f82	Change scheduler & input tensor shape (#1381 )	2023-10-16 17:48:42 -07:00
Zhuohan Li	9d9072a069	Implement prompt logprobs & Batched topk for computing logprobs (#1328 ) Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>	2023-10-16 10:56:50 -07:00
yhlskt23	91fce82c6f	change the timing of sorting logits (#1309 )	2023-10-10 19:37:42 -07:00
Zhuohan Li	ba0bfd40e2	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
Woosuk Kwon	84e4e37d14	[Minor] Fix type annotations (#1238 )	2023-10-02 15:28:31 -07:00
Zhuohan Li	f187877945	[FIX] Simplify sampler logic (#1156 )	2023-09-23 17:21:56 -07:00
Zhuohan Li	947b794146	[Sampler] Vectorized sampling (simplified) (#1048 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2023-09-22 17:48:04 -07:00
Zhuohan Li	f04908cae7	[FIX] Minor bug fixes (#1035 ) * [FIX] Minor bug fixes * Address review comments	2023-09-13 16:38:12 -07:00
Zhuohan Li	002800f081	Align vLLM's beam search implementation with HF generate (#857 )	2023-09-04 17:29:42 -07:00
Dong-Yong Lee	e11222333f	fix: bug fix when penalties are negative (#913 ) Co-authored-by: dongyong-lee <dongyong.lee@navercorp.com>	2023-09-01 00:37:17 +09:00
Aman Gupta Karmani	28873a2799	Improve _prune_hidden_states micro-benchmark (#707 )	2023-08-31 13:28:43 +09:00
Woosuk Kwon	94d2f59895	Set replacement=True in torch.multinomial (#858 )	2023-08-25 12:22:01 +09:00
Abraham-Xu	d1744376ae	Align with huggingface Top K sampling (#753 )	2023-08-15 16:44:33 -07:00
Andre Slavescu	c894836108	[Model] Add support for GPT-J (#226 ) Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>	2023-07-08 17:55:16 -07:00
Zhuohan Li	d6fa1be3a8	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
Lily Liu	425040d4c1	remove floats == 0 comparison (#285 )	2023-06-28 14:11:51 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00

40 Commits