xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-16 13:06:14 +08:00

Author	SHA1	Message	Date
youkaichao	7801f56ed7	[ci][gh200] dockerfile clean up (#11351 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: cenzhiyao <2523403608@qq.com>	2024-12-19 18:13:06 -08:00
Kunshang Ji	f954fe0e65	[FIX] update openai version (#11287 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2024-12-18 10:17:05 -08:00
Wallas Henrique	8b79f9e107	[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046 )	2024-12-17 22:34:08 -08:00
Russell Bryant	48259264a4	[Core] Update outlines and increase its threadpool size (#11140 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-14 07:46:18 +00:00
dhuangnm	24a3d12b82	update compressed-tensors to latest version (#11183 ) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>	2024-12-14 03:22:44 +00:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00
youkaichao	91642db952	[torch.compile] use depyf to dump torch.compile internals (#10972 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-11 10:43:05 -08:00
Kevin H. Luu	9974fca047	[ci/build] Fix entrypoints test and pin outlines version (#11088 )	2024-12-11 01:01:53 -08:00
Russell Bryant	e739194926	[Core] Update to outlines >= 0.1.8 (#10576 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-10 12:08:16 -08:00
Russell Bryant	e691b26f6f	[Core] Require xgrammar >= 0.1.6 (#11021 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-09 16:44:27 +00:00
Michael Goin	7090c27bb2	[Bugfix] Only require XGrammar on x86 (#10865 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-03 10:32:21 -08:00
Aaron Pham	9323a3153b	[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-12-03 15:17:00 +08:00
youkaichao	cb4e1c3f3a	[misc] upgrade filelock version (#10731 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-27 19:54:58 -08:00
Simon Mo	a6221a144a	[Misc] bump mistral common version (#10367 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2024-11-15 09:48:07 -08:00
Guillaume Calmettes	691a3ec047	[Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (#10363 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-15 14:50:40 +00:00
Dipika Sikka	56a955e774	Bump to compressed-tensors v0.8.0 (#10279 ) Signed-off-by: Dipika <dipikasikka1@gmail.com>	2024-11-12 21:54:10 -08:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
Dipika Sikka	48138a8415	[BugFix] Stop silent failures on compressed-tensors parsing (#9381 )	2024-10-17 18:54:00 -07:00
Cyrus Leung	7e7eae338d	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
Michael Goin	22f8a69549	[Misc] Directly use compressed-tensors for checkpoint definitions (#8909 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-15 15:40:25 -07:00
Roger Wang	b6d7392579	[Misc][CI/Build] Include `cv2` via `mistral_common[opencv]` (#8951 )	2024-09-30 04:28:26 +00:00
Tyler Titsworth	260024a374	[Bugfix][Intel] Fix XPU Dockerfile Build (#7824 ) Signed-off-by: tylertitsworth <tyler.titsworth@intel.com> Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-27 23:45:50 -07:00
Cyrus Leung	1b49148e47	[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764 )	2024-09-26 16:54:09 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
youkaichao	92ba7e7477	[misc] upgrade mistral-common (#8715 )	2024-09-22 15:41:59 -07:00
youkaichao	d4a2ac8302	[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713 )	2024-09-22 12:47:54 -07:00
Joe Runde	cca61642e0	[Bugfix] Fix 3.12 builds on main (#8510 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-17 00:01:45 +00:00
Isotr0py	fc990f9795	[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kernel (#8357 )	2024-09-15 16:51:44 -06:00
Cyrus Leung	ecd7a1d5b6	[Installation] Gate FastAPI version for Python 3.8 (#8456 )	2024-09-13 09:02:26 -07:00
Cyrus Leung	3f79bc3d1a	[Bugfix] Bump fastapi and pydantic version (#8435 )	2024-09-13 03:21:42 +00:00
Patrick von Platen	d394787e52	Pixtral (#8377 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-11 14:41:55 -07:00
Yang Fan	3b7fea770f	[Model][VLM] Add Qwen2-VL model support (#7905 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-11 09:31:19 -07:00
Joe Runde	cfe712bf1a	[CI/Build] Use python 3.12 in cuda image (#8133 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-07 13:03:16 -07:00
William Lin	1afc931987	[bugfix] >1.43 constraint for openai (#8169 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-04 17:35:36 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
Roger Wang	5b86b19954	[Misc] Optional installation of audio related packages (#8063 )	2024-09-01 14:46:57 -07:00
Kaunil Dhruv	058344f89a	[Frontend]-config-cli-args (#7737 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>	2024-08-30 08:21:02 -07:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
Cyrus Leung	baaedfdb2d	[mypy] Enable following imports for entrypoints (#7248 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Fei <dfdfcai4@gmail.com>	2024-08-20 23:28:21 -07:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Xander Johnson	7c0b7ea214	[Bugfix] add >= 1.0 constraint for openai dependency (#7612 )	2024-08-16 20:56:01 -07:00
PHILO-HE	f4da5f7b6d	[Misc] Update dockerfile for CPU to cover protobuf installation (#7182 )	2024-08-15 10:03:01 -07:00
Kyle Sayers	f55a9aea45	[Misc] Revert `compressed-tensors` code reuse (#7521 )	2024-08-14 15:07:37 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Kyle Sayers	373538f973	[Misc] `compressed-tensors` code reuse (#7277 )	2024-08-13 19:05:15 -04:00
Peter Salas	00c3d68e45	[Frontend][Core] Add plumbing to support audio language models (#7446 )	2024-08-13 17:39:33 +00:00
Daniele	774cd1d3bf	[CI/Build] bump minimum cmake version (#6999 )	2024-08-12 16:29:20 -07:00
Noam Gat	4fb7b52a2c	Updating LM Format Enforcer version to v0.10.6 (#7189 )	2024-08-11 08:11:50 -04:00
Cyrus Leung	7eb4a51c5f	[Core] Support serving encoder/decoder models (#7258 )	2024-08-09 10:39:41 +08:00
Isotr0py	360bd67cf0	[Core] Support loading GGUF model (#5191 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-05 17:54:23 -06:00

1 2

74 Commits