wangxiyuan
|
2e3b969ec0
|
[Platform] add pre_register_and_update function (#12432)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-02-11 22:06:46 +08:00 |
|
Yuhong Guo
|
da317197dd
|
[Build] Fix cuda link target of cumem_allocator in CPU env (#12863)
Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-11 21:55:57 +08:00 |
|
Gregory Shtrasberg
|
7539bbc6a6
|
[ROCm] Using a more precise memory profiling (#12624)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-02-11 21:47:10 +08:00 |
|
Mengqing Cao
|
9cf4759493
|
[executor] init local_rank as device index (#13027)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-02-11 21:20:53 +08:00 |
|
Cody Yu
|
41c5dd45b9
|
[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592)
|
2025-02-11 08:27:25 +00:00 |
|
Ce Gao
|
fc6485d277
|
[Bugfix]: Reasoning output bug according to the chat template change (#13025)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-02-11 15:49:03 +08:00 |
|
Varun Sundar Rabindranath
|
78a141d768
|
[Misc] LoRA - Refactor Punica ops tests (#12970)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-02-11 07:26:03 +00:00 |
|
Russell Bryant
|
c320ca8edd
|
[Core] Don't do platform detection at import time (#12933)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-11 07:25:25 +00:00 |
|
Woosuk Kwon
|
58047c6f04
|
[Benchmark] Add BurstGPT to benchmark_serving (#13063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-02-10 21:25:30 -08:00 |
|
Florian Greinacher
|
cb080f32e3
|
[Bugfix] Support missing tool parameters in mistral tokenizer (#12884)
Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>
|
2025-02-11 03:33:33 +00:00 |
|
Simon Mo
|
2c0f58203c
|
[Docs] Annouce Meta Meetup (#13065)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-02-10 18:24:29 -08:00 |
|
Woosuk Kwon
|
2ff4857678
|
[V1][Minor] Move scheduler outputs to a separate file (#13062)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-11 02:10:06 +00:00 |
|
Kevin H. Luu
|
91e876750e
|
[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU (#13022)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-02-10 18:06:16 -08:00 |
|
Farzad Abdolhosseini
|
08b2d845d6
|
[Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
|
2025-02-10 22:02:48 +00:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
2ae889052c
|
Fix seed parameter behavior in vLLM (#13007)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-02-10 23:26:50 +08:00 |
|
Cyrus Leung
|
51f0b5f7f6
|
[Bugfix] Clean up and fix multi-modal processors (#13012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-10 10:45:21 +00:00 |
|
Kevin H. Luu
|
fde71262e0
|
[misc] Add retries with exponential backoff for HF file existence check (#13008)
|
2025-02-10 01:15:02 -08:00 |
|
Yuan Tang
|
243137143c
|
[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-10 06:09:33 +00:00 |
|
youkaichao
|
b2496bb07f
|
[core] fix sleep mode and pytorch checkpoint compatibility (#13001)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 13:03:43 +08:00 |
|
Yuan Tang
|
44607e07d3
|
Check if selected backend is None in get_attn_backend_cls() (#12975)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-10 11:45:07 +08:00 |
|
Nick Hill
|
67c4637ccf
|
[V1] Use msgpack for core request serialization (#12918)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-10 11:35:56 +08:00 |
|
youkaichao
|
aa0ca5ebb7
|
[core][rlhf] add colocate example for RLHF (#12984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 10:28:59 +08:00 |
|
youkaichao
|
59fff4a01a
|
[core] improve error handling when wake up from sleep mode (#12981)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-10 09:38:57 +08:00 |
|
Lu Fang
|
29f1d47e73
|
[MISC] Always import version library first in the vllm package (#12979)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-09 18:56:40 +08:00 |
|
youkaichao
|
cf797aa856
|
[core] port pynvml into vllm codebase (#12963)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-09 15:00:00 +08:00 |
|
Woosuk Kwon
|
24700c346b
|
[V1] Cache uses_mrope in GPUModelRunner (#12969)
|
2025-02-08 15:32:32 -08:00 |
|
Patrick von Platen
|
d366ccc4e3
|
[RFC] [Mistral] FP8 format (#10130)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-02-08 14:12:53 -07:00 |
|
Woosuk Kwon
|
870c37481e
|
[V1][Minor] Remove outdated comment (#12968)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-08 12:48:30 -08:00 |
|
Jee Jee Li
|
86222a3dab
|
[VLM] Merged multi-modal processor for GLM4V (#12449)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-02-08 20:32:16 +00:00 |
|
youkaichao
|
fe743b798d
|
[bugfix] fix early import of flash attention (#12959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-09 00:06:56 +08:00 |
|
shangmingc
|
913df14da3
|
[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-02-08 14:46:19 +00:00 |
|
Cyrus Leung
|
8a69e0e20e
|
[CI/Build] Auto-fix Markdown files (#12941)
|
2025-02-08 04:25:15 -08:00 |
|
Isotr0py
|
4c8dd12ef3
|
[Misc] Add qwen2.5-vl BNB support (#12944)
|
2025-02-08 04:24:47 -08:00 |
|
Jun Duan
|
256a2d29dc
|
[Doc] Correct HF repository for TeleChat2 models (#12949)
|
2025-02-08 01:42:15 -08:00 |
|
Liangfu Chen
|
c45d398e6f
|
[CI] Resolve transformers-neuronx version conflict (#12925)
|
2025-02-08 01:41:35 -08:00 |
|
Jun Duan
|
011e612d92
|
[Misc] Log time consumption on weight downloading (#12926)
|
2025-02-08 09:16:42 +00:00 |
|
Varun Sundar Rabindranath
|
7e1837676a
|
[misc] Add LoRA to benchmark_serving (#12898)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-02-08 17:15:44 +08:00 |
|
Sanju C Sudhakaran
|
2880e21e3d
|
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
|
2025-02-08 17:15:30 +08:00 |
|
wangxiyuan
|
407b5537db
|
[Build] Make pypi install work on CPU platform (#12874)
|
2025-02-08 01:15:15 -08:00 |
|
Woosuk Kwon
|
4ea48fb35c
|
[V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-08 00:39:09 -08:00 |
|
Shaoting
|
e31498bdcb
|
[Misc] Add offline test for disaggregated prefill (#12418)
|
2025-02-08 08:38:20 +00:00 |
|
youkaichao
|
91dd8f7aa6
|
[bugfix] respect distributed_executor_backend in world_size=1 (#12934)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-08 16:17:08 +08:00 |
|
zifeitong
|
d01f66b039
|
[Bugfix] Fix multi-round chat error when mistral tokenizer is used (#12859)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-02-08 07:04:34 +00:00 |
|
Ke Zhao
|
cc01223f3b
|
[Misc] Fix typo in the example file (#12896)
Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com>
|
2025-02-08 06:56:43 +00:00 |
|
Jee Jee Li
|
306923da82
|
[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping (#12905)
|
2025-02-07 21:02:53 -08:00 |
|
Woosuk Kwon
|
3243158336
|
[V1] Move KV block hashes from Request to KVCacheManager (#12922)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-07 19:14:10 -08:00 |
|
Woosuk Kwon
|
b21f0f9d17
|
[V1][Minor] Remove outdated comment (#12928)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-07 19:07:37 -08:00 |
|
Lu Fang
|
45cbc4991d
|
[Bugfix] Fix disagg hang caused by the prefill and decode communication issues (#12723)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-07 16:39:50 -08:00 |
|
Robert Shaw
|
932c6b7461
|
[V1] LM Eval With Streaming Integration Tests (#11590)
|
2025-02-07 15:07:03 -08:00 |
|
TJian
|
eaa92d4437
|
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501)
|
2025-02-07 08:13:43 -08:00 |
|