Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
Joe
|
d92b3c5cde
|
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419)
|
2024-07-15 18:54:15 -07:00 |
|
Mor Zusman
|
9ad32dacd9
|
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425)
Co-authored-by: Mor Zusman <morz@ai21.com>
|
2024-07-16 01:32:55 +00:00 |
|
Woosuk Kwon
|
ec9933f4a5
|
[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289)
|
2024-07-15 19:02:14 +00:00 |
|
youkaichao
|
4cf256ae7f
|
[misc][distributed] fix pp missing layer condition (#6446)
|
2024-07-15 10:32:35 -07:00 |
|
Simon Mo
|
64fdc08c72
|
bump version to v0.5.2 (#6433)
|
2024-07-15 17:27:40 +00:00 |
|
Thomas Parnell
|
4ef95b0f06
|
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-15 13:14:49 -04:00 |
|
Thomas Parnell
|
eaec4b9153
|
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2024-07-15 10:12:47 -07:00 |
|
Tyler Michael Smith
|
c8fd97f26d
|
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270)
|
2024-07-15 13:05:52 -04:00 |
|
Roger Wang
|
6ae1597ddf
|
[VLM] Minor space optimization for ClipVisionModel (#6436)
|
2024-07-15 17:29:51 +08:00 |
|
Cyrus Leung
|
de19916314
|
[Bugfix] Convert image to RGB by default (#6430)
|
2024-07-15 05:39:15 +00:00 |
|
youkaichao
|
69672f116c
|
[core][distributed] simplify code to support pipeline parallel (#6406)
|
2024-07-14 21:20:51 -07:00 |
|
DefTruth
|
44874a0bf9
|
[Doc] add env docs for flashinfer backend (#6437)
|
2024-07-14 21:16:51 -07:00 |
|
zifeitong
|
b47008b4d2
|
[BugFix] BatchResponseData body should be optional (#6345)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-15 04:06:09 +00:00 |
|
Simon Mo
|
32c9d7f765
|
Report usage for beam search (#6404)
|
2024-07-14 19:37:35 -07:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Robert Shaw
|
73030b7dae
|
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
|
2024-07-14 21:38:42 +00:00 |
|
Isotr0py
|
540c0368b1
|
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-14 05:27:14 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Woosuk Kwon
|
eeceadaecc
|
[Misc] Add deprecation warning for beam search (#6402)
|
2024-07-13 11:52:22 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Thomas Parnell
|
e1684a766a
|
[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-12 18:30:54 -07:00 |
|
Woosuk Kwon
|
f8f9ff57ee
|
[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397)
|
2024-07-12 15:59:47 -07:00 |
|
Michael Goin
|
111fc6e7ec
|
[Misc] Add generated git commit hash as vllm.__commit__ (#6386)
|
2024-07-12 22:52:15 +00:00 |
|
Cody Yu
|
75f64d8b94
|
[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382)
|
2024-07-12 21:33:33 +00:00 |
|
Cyrus Leung
|
024ad87cdc
|
[Bugfix] Fix dtype mismatch in PaliGemma (#6367)
|
2024-07-12 08:22:18 -07:00 |
|
Robert Shaw
|
aea19f0989
|
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356)
|
2024-07-12 11:11:29 -04:00 |
|
Robert Shaw
|
6047187cd8
|
[ Misc ] Remove separate bias add (#6353)
|
2024-07-12 05:06:09 +00:00 |
|
Hongxia Yang
|
b6c16cf8ff
|
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352)
|
2024-07-11 21:30:46 -07:00 |
|
Michael Goin
|
d59eb98489
|
[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343)
|
2024-07-12 10:47:17 +08:00 |
|
Helena Kloosterman
|
adf32e0a0f
|
[Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349)
|
2024-07-12 10:47:00 +08:00 |
|
youkaichao
|
2b0fb53481
|
[distributed][misc] be consistent with pytorch for libcudart.so (#6346)
[distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)
|
2024-07-11 19:35:17 -07:00 |
|
Lily Liu
|
d6ab528997
|
[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351)
|
2024-07-12 01:32:06 +00:00 |
|
Robert Shaw
|
7ed6a4f0e1
|
[ BugFix ] Prompt Logprobs Detokenization (#6223)
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
|
2024-07-11 22:02:29 +00:00 |
|
xwjiang2010
|
1df43de9bb
|
[bug fix] Fix llava next feature size calculation. (#6339)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-07-11 17:21:10 +00:00 |
|
Robert Shaw
|
b675069d74
|
[ Misc ] Refactor Marlin Python Utilities (#6082)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-11 15:40:11 +00:00 |
|
Mor Zusman
|
55f692b46e
|
[BugFix] get_and_reset only when scheduler outputs are not empty (#6266)
|
2024-07-11 07:40:20 -07:00 |
|
Thomas Parnell
|
8a1415cf77
|
[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-07-11 07:05:59 -07:00 |
|
pushan
|
546b101fa0
|
[BugFix]: fix engine timeout due to request abort (#6255)
Signed-off-by: yatta zhang <ytzhang01@foxmail.com>
Signed-off-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com>
Co-authored-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com>
|
2024-07-11 06:46:31 -07:00 |
|
aniaan
|
3963a5335b
|
[Misc] refactor(config): clean up unused code (#6320)
|
2024-07-11 09:39:07 +00:00 |
|
daquexian
|
99ded1e1c4
|
[Doc] Remove comments incorrectly copied from another project (#6286)
|
2024-07-10 17:05:26 -07:00 |
|
Woosuk Kwon
|
997df46a32
|
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313)
|
2024-07-10 16:39:02 -07:00 |
|
sroy745
|
ae151d73be
|
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
|
2024-07-10 16:02:47 -07:00 |
|
sangjune.park
|
44cc76610d
|
[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296)
Signed-off-by: sangjune.park <sangjune.park@navercorp.com>
|
2024-07-10 10:03:32 -07:00 |
|
Benjamin Muskalla
|
b422d4961a
|
[CI/Build] Enable mypy typing for remaining folders (#6268)
|
2024-07-10 22:15:55 +08:00 |
|
Thomas Parnell
|
c38eba3046
|
[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-10 09:04:07 -04:00 |
|
Woosuk Kwon
|
e72ae80b06
|
[Bugfix] Support 2D input shape in MoE layer (#6287)
|
2024-07-10 09:03:16 -04:00 |
|
Cyrus Leung
|
8a924d2248
|
[Doc] Guide for adding multi-modal plugins (#6205)
|
2024-07-10 14:55:34 +08:00 |
|
Woosuk Kwon
|
5ed3505d82
|
[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279)
|
2024-07-09 19:30:56 -07:00 |
|