Keyun Tong
|
3ee696a63d
|
[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518)
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
|
2025-02-12 12:25:58 +08:00 |
|
Russell Bryant
|
72c2b68dc9
|
[Misc] Move pre-commit suggestion back to the end (#13114)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-11 22:34:16 +00:00 |
|
Yuan Tang
|
14ecab5be2
|
[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar (#12976)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-11 18:17:44 +00:00 |
|
Harry Mellor
|
deb6c1c6b4
|
[Doc] Improve OpenVINO installation doc (#13102)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-11 18:02:46 +00:00 |
|
Li, Jiang
|
565c1efa65
|
[CI/Build][Bugfix] Fix CPU backend default threads num (#13077)
|
2025-02-11 16:55:56 +00:00 |
|
Szymon Ożóg
|
2b25b7d2e1
|
Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023)
|
2025-02-11 08:38:48 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
6c4dbe23eb
|
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-12 00:21:50 +08:00 |
|
MoonRide303
|
21f5d50fa5
|
[Bugfix] Do not use resource module on Windows (#12858) (#13029)
|
2025-02-11 08:21:18 -08:00 |
|
Jewon Lee
|
bf3e05215c
|
[Misc] Fix typo at comments at metrics.py (#13024)
|
2025-02-11 08:20:37 -08:00 |
|
Harry Mellor
|
ad9776353e
|
Set torch_dtype in TransformersModel (#13088)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-11 23:51:19 +08:00 |
|
Mark McLoughlin
|
75e6e14516
|
[V1][Metrics] Add several request timing histograms (#12644)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-02-11 10:14:00 -05:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
110f59a33e
|
[Bugfix] fix flaky test (#13089)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-02-11 14:41:20 +00:00 |
|
wangxiyuan
|
2e3b969ec0
|
[Platform] add pre_register_and_update function (#12432)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-02-11 22:06:46 +08:00 |
|
Yuhong Guo
|
da317197dd
|
[Build] Fix cuda link target of cumem_allocator in CPU env (#12863)
Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-11 21:55:57 +08:00 |
|
Gregory Shtrasberg
|
7539bbc6a6
|
[ROCm] Using a more precise memory profiling (#12624)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-02-11 21:47:10 +08:00 |
|
Mengqing Cao
|
9cf4759493
|
[executor] init local_rank as device index (#13027)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-02-11 21:20:53 +08:00 |
|
Cody Yu
|
41c5dd45b9
|
[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592)
|
2025-02-11 08:27:25 +00:00 |
|
Ce Gao
|
fc6485d277
|
[Bugfix]: Reasoning output bug according to the chat template change (#13025)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-02-11 15:49:03 +08:00 |
|
Varun Sundar Rabindranath
|
78a141d768
|
[Misc] LoRA - Refactor Punica ops tests (#12970)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-02-11 07:26:03 +00:00 |
|
Russell Bryant
|
c320ca8edd
|
[Core] Don't do platform detection at import time (#12933)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-11 07:25:25 +00:00 |
|
Woosuk Kwon
|
58047c6f04
|
[Benchmark] Add BurstGPT to benchmark_serving (#13063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-02-10 21:25:30 -08:00 |
|
Florian Greinacher
|
cb080f32e3
|
[Bugfix] Support missing tool parameters in mistral tokenizer (#12884)
Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>
|
2025-02-11 03:33:33 +00:00 |
|
Simon Mo
|
2c0f58203c
|
[Docs] Annouce Meta Meetup (#13065)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-02-10 18:24:29 -08:00 |
|
Woosuk Kwon
|
2ff4857678
|
[V1][Minor] Move scheduler outputs to a separate file (#13062)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-11 02:10:06 +00:00 |
|
Kevin H. Luu
|
91e876750e
|
[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU (#13022)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-02-10 18:06:16 -08:00 |
|
Farzad Abdolhosseini
|
08b2d845d6
|
[Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
|
2025-02-10 22:02:48 +00:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
2ae889052c
|
Fix seed parameter behavior in vLLM (#13007)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-02-10 23:26:50 +08:00 |
|
Cyrus Leung
|
51f0b5f7f6
|
[Bugfix] Clean up and fix multi-modal processors (#13012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-10 10:45:21 +00:00 |
|
Kevin H. Luu
|
fde71262e0
|
[misc] Add retries with exponential backoff for HF file existence check (#13008)
|
2025-02-10 01:15:02 -08:00 |
|
Yuan Tang
|
243137143c
|
[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-10 06:09:33 +00:00 |
|
youkaichao
|
b2496bb07f
|
[core] fix sleep mode and pytorch checkpoint compatibility (#13001)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 13:03:43 +08:00 |
|
Yuan Tang
|
44607e07d3
|
Check if selected backend is None in get_attn_backend_cls() (#12975)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-10 11:45:07 +08:00 |
|
Nick Hill
|
67c4637ccf
|
[V1] Use msgpack for core request serialization (#12918)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-10 11:35:56 +08:00 |
|
youkaichao
|
aa0ca5ebb7
|
[core][rlhf] add colocate example for RLHF (#12984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 10:28:59 +08:00 |
|
youkaichao
|
59fff4a01a
|
[core] improve error handling when wake up from sleep mode (#12981)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 09:38:57 +08:00 |
|
Lu Fang
|
29f1d47e73
|
[MISC] Always import version library first in the vllm package (#12979)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-09 18:56:40 +08:00 |
|
youkaichao
|
cf797aa856
|
[core] port pynvml into vllm codebase (#12963)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-09 15:00:00 +08:00 |
|
Woosuk Kwon
|
24700c346b
|
[V1] Cache uses_mrope in GPUModelRunner (#12969)
|
2025-02-08 15:32:32 -08:00 |
|
Patrick von Platen
|
d366ccc4e3
|
[RFC] [Mistral] FP8 format (#10130)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-02-08 14:12:53 -07:00 |
|
Woosuk Kwon
|
870c37481e
|
[V1][Minor] Remove outdated comment (#12968)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-08 12:48:30 -08:00 |
|
Jee Jee Li
|
86222a3dab
|
[VLM] Merged multi-modal processor for GLM4V (#12449)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-02-08 20:32:16 +00:00 |
|
youkaichao
|
fe743b798d
|
[bugfix] fix early import of flash attention (#12959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-09 00:06:56 +08:00 |
|
shangmingc
|
913df14da3
|
[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-02-08 14:46:19 +00:00 |
|
Cyrus Leung
|
8a69e0e20e
|
[CI/Build] Auto-fix Markdown files (#12941)
|
2025-02-08 04:25:15 -08:00 |
|
Isotr0py
|
4c8dd12ef3
|
[Misc] Add qwen2.5-vl BNB support (#12944)
|
2025-02-08 04:24:47 -08:00 |
|
Jun Duan
|
256a2d29dc
|
[Doc] Correct HF repository for TeleChat2 models (#12949)
|
2025-02-08 01:42:15 -08:00 |
|
Liangfu Chen
|
c45d398e6f
|
[CI] Resolve transformers-neuronx version conflict (#12925)
|
2025-02-08 01:41:35 -08:00 |
|
Jun Duan
|
011e612d92
|
[Misc] Log time consumption on weight downloading (#12926)
|
2025-02-08 09:16:42 +00:00 |
|
Varun Sundar Rabindranath
|
7e1837676a
|
[misc] Add LoRA to benchmark_serving (#12898)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-02-08 17:15:44 +08:00 |
|
Sanju C Sudhakaran
|
2880e21e3d
|
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
|
2025-02-08 17:15:30 +08:00 |
|