Kuntai Du
|
faef77c0d6
|
[Misc] KV cache transfer connector registry (#11481)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2024-12-29 16:08:09 +00:00 |
|
youkaichao
|
dba4d9dec6
|
[v1][bugfix] fix cudagraph with inplace buffer assignment (#11596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-29 09:03:49 +00:00 |
|
Cyrus Leung
|
32b4c63f02
|
[Doc] Convert list tables to MyST (#11594)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-29 15:56:22 +08:00 |
|
Robert Shaw
|
4fb8e329fd
|
[V1] [5/N] API Server: unify Detokenizer and EngineCore input (#11545)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-28 20:51:57 +00:00 |
|
youkaichao
|
328841d002
|
[bugfix] interleaving sliding window for cohere2 model (#11583)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-28 16:55:42 +00:00 |
|
Cyrus Leung
|
d427e5cfda
|
[Doc] Minor documentation fixes (#11580)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-28 21:53:59 +08:00 |
|
Woosuk Kwon
|
42bb201fd6
|
[V1][Minor] Set pin_memory=False for token_ids_cpu tensor (#11581)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-28 13:33:12 +00:00 |
|
hj-wei
|
59d6bb4c86
|
[Hardware][AMD]: Replace HIPCC version with more precise ROCm version (#11515)
Signed-off-by: hjwei <hjwei_xd@163.com>
|
2024-12-28 11:17:35 +00:00 |
|
Roger Wang
|
b7dcc003dc
|
[Model] Remove hardcoded image tokens ids from Pixtral (#11582)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-28 10:54:23 +00:00 |
|
Isotr0py
|
d34be24bb1
|
[Model] Support InternLM2 Reward models (#11571)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-28 06:14:10 +00:00 |
|
Rajveer Bachkaniwala
|
b5cbe8eeb3
|
[Bugfix] Last token measurement fix (#11376)
Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-12-28 11:34:46 +08:00 |
|
Robert Shaw
|
df04dffade
|
[V1] [4/N] API Server: ZMQ/MP Utilities (#11541)
|
2024-12-28 01:45:08 +00:00 |
|
Chen Zhang
|
a60731247f
|
[Doc] Update mllama example based on official doc (#11567)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2024-12-28 00:31:10 +00:00 |
|
Selali
|
ac79799403
|
[Bugfix] Fix for ROCM compressed tensor support (#11561)
|
2024-12-27 20:12:11 +00:00 |
|
Isotr0py
|
dde1fa18c9
|
[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-27 19:45:13 +00:00 |
|
Jee Jee Li
|
0240402c46
|
[Misc]Add BNB quantization for MolmoForCausalLM (#11551)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-27 18:48:24 +00:00 |
|
ErezSC42
|
55509c2114
|
[MODEL] LoRA support for Jamba model (#11209)
Signed-off-by: Erez Schwartz <erezs@ai21.com>
|
2024-12-27 17:58:21 +00:00 |
|
Cyrus Leung
|
101418096f
|
[VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 17:22:48 +00:00 |
|
Chen1022
|
5ce4627a7e
|
[Doc] Add xgrammar in doc (#11549)
Signed-off-by: ccjincong <chenjincong11@gmail.com>
|
2024-12-27 13:05:10 +00:00 |
|
Cyrus Leung
|
7af553ea30
|
[Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 19:21:23 +08:00 |
|
Jee Jee Li
|
2c9b8ea2b0
|
[Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-27 10:39:15 +00:00 |
|
AlexHe99
|
d003f3ea39
|
Update deploying_with_k8s.md with AMD ROCm GPU example (#11465)
Signed-off-by: Alex He <alehe@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-27 10:00:04 +00:00 |
|
Mengqing Cao
|
6c6f7fe8a8
|
[Platform] Move model arch check to platform (#11503)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-12-27 08:45:25 +00:00 |
|
Robert Shaw
|
2339d59f92
|
[BugFix] Fix quantization for all other methods (#11547)
v0.6.6.post1
|
2024-12-26 22:23:29 -08:00 |
|
Robert Shaw
|
1b875a0ef3
|
[V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (#11534)
|
2024-12-26 21:19:21 -08:00 |
|
youkaichao
|
eb881ed006
|
[misc] fix typing (#11540)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-27 11:05:08 +08:00 |
|
Robert Shaw
|
46d4359450
|
[CI] Fix broken CI (#11543)
|
2024-12-26 18:49:16 -08:00 |
|
Woosuk Kwon
|
81b979f2a8
|
[V1] Fix yapf (#11538)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:47:10 +09:00 |
|
Woosuk Kwon
|
371d04d39b
|
[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:32:38 +09:00 |
|
Robert Shaw
|
0c0c2015c5
|
Update openai_compatible_server.md (#11536)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-12-26 16:26:18 -08:00 |
|
Simon Mo
|
82d24f7aac
|
[Docs] Document Deepseek V3 support (#11535)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-12-26 16:21:56 -08:00 |
|
Simon Mo
|
f49777ba62
|
Deepseek v3 (#11502)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
v0.6.6
|
2024-12-26 16:09:44 -08:00 |
|
Robert Shaw
|
55fb97f7bd
|
[2/N] API Server: Avoid ulimit footgun (#11530)
|
2024-12-26 23:43:05 +00:00 |
|
Michael Goin
|
2072924d14
|
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 15:33:30 -08:00 |
|
Robert Shaw
|
720b10fdc6
|
[1/N] API Server (Remove Proxy) (#11529)
|
2024-12-26 23:03:43 +00:00 |
|
Isotr0py
|
b85a977822
|
[Doc] Add video example to openai client for multimodal (#11521)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-26 17:31:29 +00:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
Jee Jee Li
|
f57ee5650d
|
[Model] Modify MolmoForCausalLM MLP (#11510)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 13:12:05 +00:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Roger Wang
|
7492a36207
|
[Doc] Add QVQ and QwQ to the list of supported models (#11509)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-26 09:44:32 +00:00 |
|
Jee Jee Li
|
aa25985bd1
|
[Misc][LoRA] Fix LoRA weight mapper (#11495)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 15:52:48 +08:00 |
|
Lucas Tucker
|
dbeac95dbb
|
Mypy checking for vllm/compilation (#11496)
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
|
2024-12-26 05:04:07 +00:00 |
|
Cyrus Leung
|
51a624bf02
|
[Misc] Move some multimodal utils to modality-specific modules (#11494)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 04:23:20 +00:00 |
|
Cyrus Leung
|
6ad909fdda
|
[Doc] Improve GitHub links (#11491)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-25 14:49:26 -08:00 |
|
Cyrus Leung
|
b689ada91e
|
[Frontend] Enable decord to load video from base64 (#11492)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-25 16:33:55 +00:00 |
|
Jiaxin Shan
|
fc601665eb
|
[Misc] Update disaggregation benchmark scripts and test logs (#11456)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
|
2024-12-25 06:58:48 +00:00 |
|
Rui Qiao
|
9832e5572a
|
[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472)
|
2024-12-24 19:49:46 -08:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Yuan Tang
|
409475a827
|
[Bugfix] Fix issues in CPU build Dockerfile. Fixes #9182 (#11435)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2024-12-24 16:53:28 +00:00 |
|
Jee Jee Li
|
196c34b0ac
|
[Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 13:05:25 +00:00 |
|