xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-14 06:07:02 +08:00

Author	SHA1	Message	Date
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Cyrus Leung	26a68d5d7e	[CI/Build] Add test decorator for minimum GPU memory (#8925 )	2024-09-29 02:50:51 +00:00
Russell Bryant	d1537039ce	[Core] Improve choice of Python multiprocessing method (#8823 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-29 09:17:07 +08:00
Tyler Michael Smith	e2f6f26e86	[Bugfix] Fix print_warning_once's line info (#8867 )	2024-09-26 16:18:26 -07:00
Russell Bryant	b05f5c9238	[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-23 12:15:41 -07:00
Alex Brooks	9b8c8ba119	[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-09-23 07:44:48 +00:00
Huazhong Ji	ca2b628b3c	[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703 )	2024-09-22 10:44:09 -07:00
Kunshang Ji	d4bf085ad0	[MISC] add support custom_op check (#8557 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-20 19:03:55 -07:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
sroy745	1009e93c5d	[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )	2024-09-17 07:35:01 -07:00
Simon Mo	546034b466	[refactor] remove triton based sampler (#8524 )	2024-09-16 20:04:48 -07:00
Nick Hill	acd5511b6d	[BugFix] Fix clean shutdown issues (#8492 )	2024-09-16 09:33:46 -07:00
Kevin Lin	295c4730a8	[Misc] Raise error when using encoder/decoder model with cpu backend (#8355 )	2024-09-12 05:45:24 +00:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Kaunil Dhruv	058344f89a	[Frontend]-config-cli-args (#7737 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>	2024-08-30 08:21:02 -07:00
bnellnm	c166e7e43e	[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886 )	2024-08-27 23:13:45 -04:00
youkaichao	b74a125800	[ci] try to log process using the port to debug the port usage (#7711 )	2024-08-20 17:41:12 -07:00
Cyrus Leung	3f674a49b5	[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )	2024-08-14 17:55:42 +00:00
Woosuk Kwon	d6e634f3d7	[TPU] Suppress import custom_ops warning (#7458 )	2024-08-13 00:30:30 -07:00
youkaichao	4d2dc5072b	[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )	2024-08-13 00:16:42 -07:00
Cyrus Leung	9ba85bc152	[mypy] Misc. typing improvements (#7417 )	2024-08-13 09:20:20 +08:00
sasha0552	91294d56e1	[Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398 )	2024-08-12 16:07:20 -07:00
Cyrus Leung	4ddc4743d7	[Core] Consolidate `GB` constant and enable float GB arguments (#7416 )	2024-08-12 14:14:14 -07:00
Alexander Matveev	e02ac55617	[Performance] Optimize e2e overheads: Reduce python allocations (#7162 )	2024-08-08 21:34:28 -07:00
Cyrus Leung	7eb4a51c5f	[Core] Support serving encoder/decoder models (#7258 )	2024-08-09 10:39:41 +08:00
Nick Hill	fc1493a01e	[FrontEnd] Make `merge_async_iterators` `is_cancelled` arg optional (#7282 )	2024-08-07 13:35:14 -07:00
Robert Shaw	564985729a	[ BugFix ] Move `zmq` frontend to IPC instead of TCP (#7222 )	2024-08-07 16:24:56 +00:00
youkaichao	639159b2a6	[distributed][misc] add specialized method for cuda platform (#7249 )	2024-08-07 08:54:52 -07:00
Nick Hill	9a3f49ae07	[BugFix] Overhaul async request cancellation (#7111 )	2024-08-07 13:21:41 +08:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
Robert Shaw	ed812a73fa	[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-02 18:27:28 -07:00
youkaichao	660dea1235	[cuda][misc] remove error_on_invalid_device_count_status (#7069 )	2024-08-02 00:14:21 -07:00
youkaichao	252357793d	[ci][distributed] try to fix pp test (#7054 )	2024-08-01 22:03:12 -07:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Cyrus Leung	da1f7cc12a	[mypy] Enable following imports for some directories (#6681 )	2024-07-31 10:38:03 +08:00
Joe	14dbd5a767	[Model] H2O Danube3-4b (#6451 )	2024-07-26 20:47:50 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Antoni Baum	0e63494cf3	Add fp8 support to `reshape_and_cache_flash` (#6667 )	2024-07-24 18:36:52 +00:00
Cody Yu	e0c15758b8	[Core] Modulize prepare input and attention metadata builder (#6596 )	2024-07-23 00:45:24 +00:00
Cyrus Leung	9042d68362	[Misc] Consolidate and optimize logic for building padded tensors (#6541 )	2024-07-20 04:17:24 +00:00
Antoni Baum	9ed82e7074	[Misc] Small perf improvements (#6520 )	2024-07-19 12:10:56 -07:00
Nick Hill	b5672a112c	[Core] Multiprocessing Pipeline Parallel support (#6130 ) Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-18 19:15:52 -07:00
Hongxia Yang	b6c16cf8ff	[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352 )	2024-07-11 21:30:46 -07:00
Yuan	81d7a50f24	[Hardware][Intel CPU] Adding intel openmp tunings in Docker file (#6008 ) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>	2024-07-04 15:22:12 -07:00
youkaichao	f666207161	[misc][distributed] error on invalid state (#6092 )	2024-07-02 23:37:29 -07:00
youkaichao	482045ee77	[hardware][misc] introduce platform abstraction (#6080 )	2024-07-02 20:12:22 -07:00
youkaichao	614aa51203	[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007 )	2024-06-30 20:07:34 -07:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00

1 2 3

124 Commits