cennn
|
9e764e7b10
|
[distributed] remove pynccl's redundant change_state (#11749)
|
2025-01-06 09:05:48 +08:00 |
|
cennn
|
635b897246
|
[distributed] remove pynccl's redundant stream (#11744)
|
2025-01-05 23:09:11 +08:00 |
|
Jee Jee Li
|
47831430cc
|
[Bugfix][V1] Fix test_kv_cache_utils.py (#11738)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-04 16:07:59 +00:00 |
|
Cyrus Leung
|
ba214dffbe
|
[Bugfix] Fix precision error in LLaVA-NeXT (#11735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 23:45:57 +08:00 |
|
Cyrus Leung
|
eed11ebee9
|
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 11:40:53 +00:00 |
|
Yan Burman
|
300acb8347
|
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
|
2025-01-04 14:50:16 +08:00 |
|
xcnick
|
d91457d529
|
[V1] Add kv cache utils tests. (#11513)
Signed-off-by: xcnick <xcnick0412@gmail.com>
|
2025-01-04 14:49:46 +08:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Cyrus Leung
|
8c38ee7007
|
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 16:39:27 +00:00 |
|
Cyrus Leung
|
a115ac46b5
|
[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-01 15:44:42 +00:00 |
|
Woosuk Kwon
|
73001445fb
|
[V1] Implement Cascade Attention (#11635)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-01 21:56:46 +09:00 |
|
Jee Jee Li
|
11d8a091c6
|
[Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-01 14:42:23 +08:00 |
|
Cyrus Leung
|
365801fedd
|
[VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-31 22:15:21 -08:00 |
|
Joe Runde
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|
Roger Wang
|
e7c7c5e822
|
[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-31 21:17:22 +00:00 |
|
Chen Zhang
|
8c3230d8c1
|
[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646)
|
2024-12-31 08:56:01 +00:00 |
|
sakunkun
|
2c5718809b
|
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565)
|
2024-12-31 06:29:04 +00:00 |
|
John Giorgi
|
82c49d3260
|
[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-30 22:15:58 -08:00 |
|
Michael Goin
|
74fa1d123c
|
[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-31 03:43:54 +00:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
youkaichao
|
3682e33f9f
|
[v1] fix compilation cache (#11598)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 04:24:12 +00:00 |
|
Robert Shaw
|
4fb8e329fd
|
[V1] [5/N] API Server: unify Detokenizer and EngineCore input (#11545)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-28 20:51:57 +00:00 |
|
youkaichao
|
328841d002
|
[bugfix] interleaving sliding window for cohere2 model (#11583)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-28 16:55:42 +00:00 |
|
Isotr0py
|
d34be24bb1
|
[Model] Support InternLM2 Reward models (#11571)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-28 06:14:10 +00:00 |
|
Robert Shaw
|
df04dffade
|
[V1] [4/N] API Server: ZMQ/MP Utilities (#11541)
|
2024-12-28 01:45:08 +00:00 |
|
ErezSC42
|
55509c2114
|
[MODEL] LoRA support for Jamba model (#11209)
Signed-off-by: Erez Schwartz <erezs@ai21.com>
|
2024-12-27 17:58:21 +00:00 |
|
Cyrus Leung
|
101418096f
|
[VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 17:22:48 +00:00 |
|
Cyrus Leung
|
7af553ea30
|
[Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 19:21:23 +08:00 |
|
Robert Shaw
|
46d4359450
|
[CI] Fix broken CI (#11543)
|
2024-12-26 18:49:16 -08:00 |
|
Woosuk Kwon
|
371d04d39b
|
[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:32:38 +09:00 |
|
Michael Goin
|
2072924d14
|
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 15:33:30 -08:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Jee Jee Li
|
aa25985bd1
|
[Misc][LoRA] Fix LoRA weight mapper (#11495)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 15:52:48 +08:00 |
|
Cyrus Leung
|
51a624bf02
|
[Misc] Move some multimodal utils to modality-specific modules (#11494)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 04:23:20 +00:00 |
|
Jiaxin Shan
|
fc601665eb
|
[Misc] Update disaggregation benchmark scripts and test logs (#11456)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
|
2024-12-25 06:58:48 +00:00 |
|
Rui Qiao
|
9832e5572a
|
[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472)
|
2024-12-24 19:49:46 -08:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Jee Jee Li
|
196c34b0ac
|
[Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 13:05:25 +00:00 |
|
Jee Jee Li
|
b1b1038fbd
|
[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 09:56:10 +00:00 |
|
Cyrus Leung
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
Rui Qiao
|
a491d6f535
|
[V1] TP Ray executor (#11107)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-12-23 23:00:12 +00:00 |
|
Michael Goin
|
63afbe9215
|
[CI] Expand OpenAI test_chat.py guided decoding tests (#11048)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 18:35:38 +00:00 |
|
Dipika Sikka
|
8cef6e02dc
|
[Misc] add w8a8 asym models (#11075)
|
2024-12-23 13:33:20 -05:00 |
|
Michael Goin
|
5bfb30a529
|
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 23:06:20 +08:00 |
|
Jason T. Greene
|
f1d1bf6288
|
[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390)
Signed-off-by: Jason Greene <jason.greene@redhat.com>
|
2024-12-22 23:25:10 +08:00 |
|
Roger Wang
|
29c748930e
|
[CI] Fix flaky entrypoint tests (#11403)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-21 21:08:44 -08:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Wallas Henrique
|
86c2d8fd1c
|
[Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-12-20 05:15:31 +00:00 |
|