xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 02:17:13 +08:00

Author	SHA1	Message	Date
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
Joe	d92b3c5cde	[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419 )	2024-07-15 18:54:15 -07:00
Mor Zusman	9ad32dacd9	[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425 ) Co-authored-by: Mor Zusman <morz@ai21.com>	2024-07-16 01:32:55 +00:00
Woosuk Kwon	ec9933f4a5	[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289 )	2024-07-15 19:02:14 +00:00
youkaichao	4cf256ae7f	[misc][distributed] fix pp missing layer condition (#6446 )	2024-07-15 10:32:35 -07:00
Simon Mo	64fdc08c72	bump version to v0.5.2 (#6433 )	2024-07-15 17:27:40 +00:00
Thomas Parnell	4ef95b0f06	[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-15 13:14:49 -04:00
Thomas Parnell	eaec4b9153	[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>	2024-07-15 10:12:47 -07:00
Tyler Michael Smith	c8fd97f26d	[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270 )	2024-07-15 13:05:52 -04:00
Roger Wang	6ae1597ddf	[VLM] Minor space optimization for `ClipVisionModel` (#6436 )	2024-07-15 17:29:51 +08:00
Cyrus Leung	de19916314	[Bugfix] Convert image to RGB by default (#6430 )	2024-07-15 05:39:15 +00:00
youkaichao	69672f116c	[core][distributed] simplify code to support pipeline parallel (#6406 )	2024-07-14 21:20:51 -07:00
DefTruth	44874a0bf9	[Doc] add env docs for flashinfer backend (#6437 )	2024-07-14 21:16:51 -07:00
zifeitong	b47008b4d2	[BugFix] BatchResponseData body should be optional (#6345 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-15 04:06:09 +00:00
Simon Mo	32c9d7f765	Report usage for beam search (#6404 )	2024-07-14 19:37:35 -07:00
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
Robert Shaw	73030b7dae	[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423 )	2024-07-14 21:38:42 +00:00
Isotr0py	540c0368b1	[Model] Initialize Fuyu-8B support (#3924 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-14 05:27:14 +00:00
Robert Shaw	fb6af8bc08	[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417 )	2024-07-13 20:03:58 -07:00
Woosuk Kwon	eeceadaecc	[Misc] Add deprecation warning for beam search (#6402 )	2024-07-13 11:52:22 -07:00
Robert Shaw	babf52dade	[ Misc ] More Cleanup of Marlin (#6359 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-13 10:21:37 +00:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00
Thomas Parnell	e1684a766a	[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-12 18:30:54 -07:00
Woosuk Kwon	f8f9ff57ee	[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397 )	2024-07-12 15:59:47 -07:00
Michael Goin	111fc6e7ec	[Misc] Add generated git commit hash as `vllm.__commit__` (#6386 )	2024-07-12 22:52:15 +00:00
Cody Yu	75f64d8b94	[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382 )	2024-07-12 21:33:33 +00:00
Cyrus Leung	024ad87cdc	[Bugfix] Fix dtype mismatch in PaliGemma (#6367 )	2024-07-12 08:22:18 -07:00
Robert Shaw	aea19f0989	[ Misc ] Support Models With Bias in `compressed-tensors` integration (#6356 )	2024-07-12 11:11:29 -04:00
Robert Shaw	6047187cd8	[ Misc ] Remove separate bias add (#6353 )	2024-07-12 05:06:09 +00:00
Hongxia Yang	b6c16cf8ff	[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352 )	2024-07-11 21:30:46 -07:00
Michael Goin	d59eb98489	[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )	2024-07-12 10:47:17 +08:00
Helena Kloosterman	adf32e0a0f	[Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349 )	2024-07-12 10:47:00 +08:00
youkaichao	2b0fb53481	[distributed][misc] be consistent with pytorch for libcudart.so (#6346 ) [distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)	2024-07-11 19:35:17 -07:00
Lily Liu	d6ab528997	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00
Robert Shaw	7ed6a4f0e1	[ BugFix ] Prompt Logprobs Detokenization (#6223 ) Co-authored-by: Zifei Tong <zifeitong@gmail.com>	2024-07-11 22:02:29 +00:00
xwjiang2010	1df43de9bb	[bug fix] Fix llava next feature size calculation. (#6339 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-07-11 17:21:10 +00:00
Robert Shaw	b675069d74	[ Misc ] Refactor Marlin Python Utilities (#6082 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-11 15:40:11 +00:00
Mor Zusman	55f692b46e	[BugFix] get_and_reset only when scheduler outputs are not empty (#6266 )	2024-07-11 07:40:20 -07:00
Thomas Parnell	8a1415cf77	[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-11 07:05:59 -07:00
pushan	546b101fa0	[BugFix]: fix engine timeout due to request abort (#6255 ) Signed-off-by: yatta zhang <ytzhang01@foxmail.com> Signed-off-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com> Co-authored-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com>	2024-07-11 06:46:31 -07:00
aniaan	3963a5335b	[Misc] refactor(config): clean up unused code (#6320 )	2024-07-11 09:39:07 +00:00
daquexian	99ded1e1c4	[Doc] Remove comments incorrectly copied from another project (#6286 )	2024-07-10 17:05:26 -07:00
Woosuk Kwon	997df46a32	[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313 )	2024-07-10 16:39:02 -07:00
sroy745	ae151d73be	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
sangjune.park	44cc76610d	[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296 ) Signed-off-by: sangjune.park <sangjune.park@navercorp.com>	2024-07-10 10:03:32 -07:00
Benjamin Muskalla	b422d4961a	[CI/Build] Enable mypy typing for remaining folders (#6268 )	2024-07-10 22:15:55 +08:00
Thomas Parnell	c38eba3046	[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-10 09:04:07 -04:00
Woosuk Kwon	e72ae80b06	[Bugfix] Support 2D input shape in MoE layer (#6287 )	2024-07-10 09:03:16 -04:00
Cyrus Leung	8a924d2248	[Doc] Guide for adding multi-modal plugins (#6205 )	2024-07-10 14:55:34 +08:00
Woosuk Kwon	5ed3505d82	[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279 )	2024-07-09 19:30:56 -07:00

1 2 3 4 5 ...

1196 Commits