xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-19 07:07:00 +08:00

Author	SHA1	Message	Date
Roger Wang	520ca380ae	[Hotfix][VLM] Fixing max position embeddings for Pixtral (#8399 )	2024-09-12 09:28:37 -07:00
youkaichao	7de49aa86c	[torch.compile] hide slicing under custom op for inductor (#8384 )	2024-09-12 00:11:55 -07:00
Woosuk Kwon	42ffba11ad	[Misc] Use RoPE cache for MRoPE (#8396 )	2024-09-11 23:13:14 -07:00
Kevin Lin	295c4730a8	[Misc] Raise error when using encoder/decoder model with cpu backend (#8355 )	2024-09-12 05:45:24 +00:00
Blueyo0	1bf2dd9df0	[Gemma2] add bitsandbytes support for Gemma2 (#8338 )	2024-09-11 21:53:12 -07:00
tomeras91	5a60699c45	[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366 )	2024-09-12 03:55:30 +00:00
Michael Goin	b6c75e1cf2	Fix the AMD weight loading tests (#8390 )	2024-09-11 20:35:33 -07:00
Woosuk Kwon	b71c956deb	[TPU] Use Ray for default distributed backend (#8389 )	2024-09-11 20:31:51 -07:00
youkaichao	f842a7aff1	[misc] remove engine_use_ray (#8126 )	2024-09-11 18:23:36 -07:00
Cody Yu	a65cb16067	[MISC] Dump model runner inputs when crashing (#8305 )	2024-09-12 01:12:25 +00:00
Simon Mo	3fd2b0d21c	Bump version to v0.6.1 (#8379 ) v0.6.1	2024-09-11 14:42:11 -07:00
Patrick von Platen	d394787e52	Pixtral (#8377 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-11 14:41:55 -07:00
Lily Liu	775f00f81e	[Speculative Decoding] Test refactor (#8317 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-11 14:07:34 -07:00
Aarni Koskela	8baa454937	[Misc] Move device options to a single place (#8322 )	2024-09-11 13:25:58 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Cyrus Leung	7015417fd4	[Bugfix] Add missing attributes in mistral tokenizer (#8364 )	2024-09-11 11:36:54 -07:00
Alexey Kondratiev(AMD)	aea02f30de	[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373 )	2024-09-11 18:31:41 +00:00
Li, Jiang	0b952af458	[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257 )	2024-09-11 09:46:46 -07:00
Yang Fan	3b7fea770f	[Model][VLM] Add Qwen2-VL model support (#7905 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-11 09:31:19 -07:00
Pooya Davoodi	cea95dfb94	[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347 )	2024-09-11 05:30:11 +00:00
Yangshen⚡Deng	6a512a00df	[model] Support for Llava-Next-Video model (#7559 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-10 22:21:36 -07:00
Pavani Majety	efcf946a15	[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112 )	2024-09-11 00:38:40 -04:00
Isotr0py	1230263e16	[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel (#8299 )	2024-09-11 10:11:01 +08:00
Jee Jee Li	e497b8aeff	[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (#8329 )	2024-09-10 20:59:19 -04:00
Tyler Michael Smith	94144e726c	[CI/Build][Kernel] Update CUTLASS to 3.5.1 tag (#8043 )	2024-09-10 23:51:58 +00:00
William Lin	1d5e397aa4	[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (#8172 )	2024-09-10 23:46:08 +00:00
Alexander Matveev	22f3a4bc6c	[Bugfix] lookahead block table with cuda graph max capture (#8340 ) [Bugfix] Ensure multistep lookahead allocation is compatible with cuda graph max capture (#8340)	2024-09-10 16:00:35 -07:00
Cody Yu	b1f3e18958	[MISC] Keep chunked prefill enabled by default with long context when prefix caching is enabled (#8342 )	2024-09-10 22:28:28 +00:00
Prashant Gupta	04e7c4e771	[Misc] remove peft as dependency for prompt models (#8162 )	2024-09-10 17:21:56 -04:00
Kevin Lin	5faedf1b62	[Spec Decode] Move ops.advance_step to flash attn advance_step (#8224 )	2024-09-10 13:18:14 -07:00
sumitd2	02751a7a42	Fix ppc64le buildkite job (#8309 )	2024-09-10 12:58:34 -07:00
Alexey Kondratiev(AMD)	f421f3cefb	[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130 )	2024-09-10 11:51:15 -07:00
Cyrus Leung	8c054b7a62	[Frontend] Clean up type annotations for mistral tokenizer (#8314 )	2024-09-10 16:49:11 +00:00
Daniele	6234385f4a	[CI/Build] enable ccache/scccache for HIP builds (#8327 )	2024-09-10 08:55:08 -07:00
Cyrus Leung	da1a844e61	[Bugfix] Fix missing `post_layernorm` in CLIP (#8155 )	2024-09-10 08:22:50 +00:00
Simon Mo	a1d874224d	Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319 )	2024-09-09 23:21:00 -07:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Kyle Sayers	c7cb5c3335	[Misc] GPTQ Activation Ordering (#8135 )	2024-09-09 16:27:26 -04:00
Vladislav Kruglikov	f9b4a2d415	[Bugfix] Correct adapter usage for cohere and jamba (#8292 )	2024-09-09 11:20:46 -07:00
Adam Lugowski	58fcc8545a	[Frontend] Add progress reporting to run_batch.py (#8060 ) Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>	2024-09-09 11:16:37 -07:00
Kyle Mistele	08287ef675	[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272 )	2024-09-09 10:45:11 -04:00
Alexander Matveev	4ef41b8476	[Bugfix] Fix async postprocessor in case of preemption (#8267 )	2024-09-07 21:01:51 -07:00
Joe Runde	cfe712bf1a	[CI/Build] Use python 3.12 in cuda image (#8133 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-07 13:03:16 -07:00
sumitd2	b962ee1470	ppc64le: Dockerfile fixed, and a script for buildkite (#8026 )	2024-09-07 11:18:40 -07:00
Isotr0py	36bf8150cc	[Model][VLM] Decouple weight loading logic for `Paligemma` (#8269 )	2024-09-07 17:45:44 +00:00
Isotr0py	e807125936	[Model][VLM] Support multi-images inputs for InternVL2 models (#8201 )	2024-09-07 16:38:23 +08:00
Cyrus Leung	9f68e00d27	[Bugfix] Fix broken OpenAI tensorizer test (#8258 )	2024-09-07 08:02:39 +00:00
youkaichao	ce2702a923	[tpu][misc] fix typo (#8260 )	2024-09-06 22:40:46 -07:00
Wei-Sheng Chin	795b662cff	Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241 )	2024-09-06 20:18:16 -07:00
Cyrus Leung	2f707fcb35	[Model] Multi-input support for LLaVA (#8238 )	2024-09-07 02:57:24 +00:00

... 5 6 7 8 9 ...

2925 Commits