xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 03:27:11 +08:00

Author	SHA1	Message	Date
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
Russell Bryant	72c2b68dc9	[Misc] Move pre-commit suggestion back to the end (#13114 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-11 22:34:16 +00:00
Yuan Tang	14ecab5be2	[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar (#12976 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-11 18:17:44 +00:00
Harry Mellor	deb6c1c6b4	[Doc] Improve OpenVINO installation doc (#13102 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-11 18:02:46 +00:00
Li, Jiang	565c1efa65	[CI/Build][Bugfix] Fix CPU backend default threads num (#13077 )	2025-02-11 16:55:56 +00:00
Szymon Ożóg	2b25b7d2e1	Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023 )	2025-02-11 08:38:48 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	6c4dbe23eb	[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-02-12 00:21:50 +08:00
MoonRide303	21f5d50fa5	[Bugfix] Do not use resource module on Windows (#12858 ) (#13029 )	2025-02-11 08:21:18 -08:00
Jewon Lee	bf3e05215c	[Misc] Fix typo at comments at metrics.py (#13024 )	2025-02-11 08:20:37 -08:00
Harry Mellor	ad9776353e	Set `torch_dtype` in `TransformersModel` (#13088 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-11 23:51:19 +08:00
Mark McLoughlin	75e6e14516	[V1][Metrics] Add several request timing histograms (#12644 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-11 10:14:00 -05:00
மனோஜ்குமார் பழனிச்சாமி	110f59a33e	[Bugfix] fix flaky test (#13089 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-11 14:41:20 +00:00
wangxiyuan	2e3b969ec0	[Platform] add pre_register_and_update function (#12432 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-11 22:06:46 +08:00
Yuhong Guo	da317197dd	[Build] Fix cuda link target of cumem_allocator in CPU env (#12863 ) Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-11 21:55:57 +08:00
Gregory Shtrasberg	7539bbc6a6	[ROCm] Using a more precise memory profiling (#12624 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-02-11 21:47:10 +08:00
Mengqing Cao	9cf4759493	[executor] init `local_rank` as device index (#13027 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-02-11 21:20:53 +08:00
Cody Yu	41c5dd45b9	[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592 )	2025-02-11 08:27:25 +00:00
Ce Gao	fc6485d277	[Bugfix]: Reasoning output bug according to the chat template change (#13025 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-02-11 15:49:03 +08:00
Varun Sundar Rabindranath	78a141d768	[Misc] LoRA - Refactor Punica ops tests (#12970 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-11 07:26:03 +00:00
Russell Bryant	c320ca8edd	[Core] Don't do platform detection at import time (#12933 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-11 07:25:25 +00:00
Woosuk Kwon	58047c6f04	[Benchmark] Add BurstGPT to benchmark_serving (#13063 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-02-10 21:25:30 -08:00
Florian Greinacher	cb080f32e3	[Bugfix] Support missing tool parameters in mistral tokenizer (#12884 ) Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>	2025-02-11 03:33:33 +00:00
Simon Mo	2c0f58203c	[Docs] Annouce Meta Meetup (#13065 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-02-10 18:24:29 -08:00
Woosuk Kwon	2ff4857678	[V1][Minor] Move scheduler outputs to a separate file (#13062 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-11 02:10:06 +00:00
Kevin H. Luu	91e876750e	[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU (#13022 ) Signed-off-by: kevin <kevin@anyscale.com>	2025-02-10 18:06:16 -08:00
Farzad Abdolhosseini	08b2d845d6	[Model] Ultravox Model: Support v0.5 Release (#12912 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-02-10 22:02:48 +00:00
மனோஜ்குமார் பழனிச்சாமி	2ae889052c	Fix seed parameter behavior in vLLM (#13007 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-10 23:26:50 +08:00
Cyrus Leung	51f0b5f7f6	[Bugfix] Clean up and fix multi-modal processors (#13012 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-10 10:45:21 +00:00
Kevin H. Luu	fde71262e0	[misc] Add retries with exponential backoff for HF file existence check (#13008 )	2025-02-10 01:15:02 -08:00
Yuan Tang	243137143c	[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 06:09:33 +00:00
youkaichao	b2496bb07f	[core] fix sleep mode and pytorch checkpoint compatibility (#13001 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 13:03:43 +08:00
Yuan Tang	44607e07d3	Check if selected backend is None in get_attn_backend_cls() (#12975 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 11:45:07 +08:00
Nick Hill	67c4637ccf	[V1] Use msgpack for core request serialization (#12918 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-10 11:35:56 +08:00
youkaichao	aa0ca5ebb7	[core][rlhf] add colocate example for RLHF (#12984 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 10:28:59 +08:00
youkaichao	59fff4a01a	[core] improve error handling when wake up from sleep mode (#12981 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 09:38:57 +08:00
Lu Fang	29f1d47e73	[MISC] Always import version library first in the vllm package (#12979 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-09 18:56:40 +08:00
youkaichao	cf797aa856	[core] port pynvml into vllm codebase (#12963 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 15:00:00 +08:00
Woosuk Kwon	24700c346b	[V1] Cache `uses_mrope` in GPUModelRunner (#12969 )	2025-02-08 15:32:32 -08:00
Patrick von Platen	d366ccc4e3	[RFC] [Mistral] FP8 format (#10130 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-08 14:12:53 -07:00
Woosuk Kwon	870c37481e	[V1][Minor] Remove outdated comment (#12968 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 12:48:30 -08:00
Jee Jee Li	86222a3dab	[VLM] Merged multi-modal processor for GLM4V (#12449 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-08 20:32:16 +00:00
youkaichao	fe743b798d	[bugfix] fix early import of flash attention (#12959 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 00:06:56 +08:00
shangmingc	913df14da3	[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-08 14:46:19 +00:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Isotr0py	4c8dd12ef3	[Misc] Add qwen2.5-vl BNB support (#12944 )	2025-02-08 04:24:47 -08:00
Jun Duan	256a2d29dc	[Doc] Correct HF repository for TeleChat2 models (#12949 )	2025-02-08 01:42:15 -08:00
Liangfu Chen	c45d398e6f	[CI] Resolve transformers-neuronx version conflict (#12925 )	2025-02-08 01:41:35 -08:00
Jun Duan	011e612d92	[Misc] Log time consumption on weight downloading (#12926 )	2025-02-08 09:16:42 +00:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00

1 2 3 4 5 ...

4538 Commits