Michael Goin
|
6d94420246
|
[Doc] Update supported_hardware.rst (#7276)
|
2024-08-07 14:21:50 -07:00 |
|
Nick Hill
|
fc1493a01e
|
[FrontEnd] Make merge_async_iterators is_cancelled arg optional (#7282)
|
2024-08-07 13:35:14 -07:00 |
|
Lucas Wilkinson
|
311f743831
|
[Bugfix] Fix gptq failure on T4s (#7264)
|
2024-08-07 20:05:37 +00:00 |
|
Kevin H. Luu
|
469b3bc538
|
[ci] Make building wheels per commit optional (#7278)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-07 11:34:25 -07:00 |
|
Michael Goin
|
5223199e03
|
[Bugfix][FP8] Fix dynamic FP8 Marlin quantization (#7219)
|
2024-08-07 11:23:12 -07:00 |
|
Maximilien de Bayser
|
fde47d3bc2
|
[BugFix] Fix frontend multiprocessing hang (#7217)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-08-07 18:09:36 +00:00 |
|
Stas Bekman
|
0e12cd67a8
|
[Doc] add online speculative decoding example (#7243)
|
2024-08-07 09:58:02 -07:00 |
|
Ilya Lavrenov
|
80cbe10c59
|
[OpenVINO] migrate to latest dependencies versions (#7251)
|
2024-08-07 09:49:10 -07:00 |
|
Isotr0py
|
b764547616
|
[Bugfix] Fix input processor for InternVL2 model (#7164)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-07 09:32:07 -07:00 |
|
Rafael Vasquez
|
ab0f5e2823
|
Fixes typo in function name (#7275)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-08-07 09:29:27 -07:00 |
|
Robert Shaw
|
564985729a
|
[ BugFix ] Move zmq frontend to IPC instead of TCP (#7222)
|
2024-08-07 16:24:56 +00:00 |
|
Dipika Sikka
|
0f7052bc7e
|
[Misc] Refactor linear layer weight loading; introduce BasevLLMParameter and weight_loader_v2 (#5874)
|
2024-08-07 09:17:58 -07:00 |
|
youkaichao
|
639159b2a6
|
[distributed][misc] add specialized method for cuda platform (#7249)
|
2024-08-07 08:54:52 -07:00 |
|
Cyrus Leung
|
66d617e343
|
[Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-07 09:12:05 +00:00 |
|
Atilla Akkuş
|
7b261092de
|
[BUGFIX]: top_k is expected to be an integer. (#7227)
|
2024-08-07 00:32:16 -07:00 |
|
Roger Wang
|
2385c8f374
|
[Doc] Mock new dependencies for documentation (#7245)
|
2024-08-07 06:43:03 +00:00 |
|
Nick Hill
|
9a3f49ae07
|
[BugFix] Overhaul async request cancellation (#7111)
|
2024-08-07 13:21:41 +08:00 |
|
Michael Goin
|
f9a5600649
|
[Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225)
|
2024-08-06 18:34:26 -07:00 |
|
afeldman-nm
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
xiaobochen123
|
660470e5a3
|
[Core] Optimize evictor-v2 performance (#7193)
|
2024-08-06 12:34:25 -07:00 |
|
Luka Govedič
|
8d59dbb000
|
[Kernel] Add per-tensor and per-token AZP epilogues (#5941)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-08-06 18:17:08 +00:00 |
|
Lily Liu
|
5c60c8c423
|
[SpecDecode] [Minor] Fix spec decode sampler tests (#7183)
|
2024-08-06 10:40:32 -07:00 |
|
Katarzyna Papis
|
00afc78590
|
[Bugfix] add gguf dependency (#7198)
Co-authored-by: katarzyna.papis <kpapis@kpapis-u20.sclab.intel.com>
|
2024-08-06 10:08:35 -07:00 |
|
Robert Shaw
|
541c1852d3
|
[ BugFix ] Fix ZMQ when VLLM_PORT is set (#7205)
|
2024-08-06 09:26:26 -07:00 |
|
Dipika Sikka
|
a3bbbfa1d8
|
[BugFix] Fix DeepSeek remote code (#7178)
|
2024-08-06 08:16:53 -07:00 |
|
Cyrus Leung
|
1f26efbb3a
|
[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-08-06 16:55:31 +08:00 |
|
Jee Jee Li
|
9118217f58
|
[LoRA] Relax LoRA condition (#7146)
|
2024-08-06 01:57:25 +00:00 |
|
Simon Mo
|
e3c664bfcb
|
[Build] Add initial conditional testing spec (#6841)
|
2024-08-05 17:39:22 -07:00 |
|
Isotr0py
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
Cody Yu
|
ef527be06c
|
[MISC] Use non-blocking transfer in prepare_input (#7172)
|
2024-08-05 23:41:27 +00:00 |
|
Jacob Schein
|
89b8db6bb2
|
[Bugfix] Specify device when loading LoRA and embedding tensors (#7129)
Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>
|
2024-08-05 16:35:47 -07:00 |
|
Thomas Parnell
|
789937af2e
|
[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-08-05 23:29:43 +00:00 |
|
youkaichao
|
dfb1a15dcb
|
[ci][frontend] deduplicate tests (#7101)
|
2024-08-05 15:59:22 -07:00 |
|
Simon Mo
|
4db5176d97
|
bump version to v0.5.4 (#7139)
v0.5.4
|
2024-08-05 14:39:48 -07:00 |
|
Tyler Michael Smith
|
4cf1dc39be
|
[Bugfix][CI/Build] Fix CUTLASS FetchContent (#7171)
|
2024-08-05 14:22:57 -07:00 |
|
Tyler Michael Smith
|
6e4852ce28
|
[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001)
|
2024-08-05 16:00:01 -04:00 |
|
Tyler Michael Smith
|
8571ac4672
|
[Kernel] Update CUTLASS to 3.5.1 (#7085)
|
2024-08-05 15:13:43 -04:00 |
|
Rui Qiao
|
997cf78308
|
[Misc] Fix typo in GroupCoordinator.recv() (#7167)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-05 11:10:16 -07:00 |
|
Aditya Paliwal
|
57f560aa23
|
[BugFix] Use args.trust_remote_code (#7121)
|
2024-08-05 09:26:14 -07:00 |
|
Nick Hill
|
003f8ee128
|
[BugFix] Use IP4 localhost form for zmq bind (#7163)
|
2024-08-05 08:41:03 -07:00 |
|
Bongwon Jang
|
e9630458c7
|
[SpecDecode] Support FlashInfer in DraftModelRunner (#6926)
|
2024-08-05 08:05:05 -07:00 |
|
Cade Daniel
|
82a1b1a82b
|
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963)
|
2024-08-05 08:46:44 +00:00 |
|
Jungho Christopher Cho
|
c0d8f1636c
|
[Model] SiglipVisionModel ported from transformers (#6942)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-05 06:22:12 +00:00 |
|
Cyrus Leung
|
cc08fc7225
|
[Frontend] Reapply "Factor out code for running uvicorn" (#7095)
|
2024-08-04 20:40:51 -07:00 |
|
Alphi
|
7b86e7c9cd
|
[Model] Add multi-image support for minicpmv (#7122)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-05 09:23:17 +08:00 |
|
Jee Jee Li
|
f80ab3521c
|
Clean up remaining Punica C information (#7027)
|
2024-08-04 15:37:08 -07:00 |
|
youkaichao
|
16a1cc9bb2
|
[misc][distributed] improve libcudart.so finding (#7127)
|
2024-08-04 11:31:51 -07:00 |
|
Thomas Parnell
|
b1c9aa3daa
|
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-08-04 07:13:18 -07:00 |
|
Jee Jee Li
|
179a6a36f2
|
[Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 08:12:41 +00:00 |
|
youkaichao
|
83c644fe7e
|
[core][misc] simply output processing with shortcut code path (#7117)
|
2024-08-04 00:22:19 -07:00 |
|