yulei
|
b17046e298
|
[BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234)
|
2024-10-22 15:43:03 -07:00 |
|
Lucas Wilkinson
|
d1e8240875
|
[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487)
|
2024-10-22 15:41:13 -07:00 |
|
Jeremy Arnold
|
cb6fdaa0a0
|
[Misc] Make benchmarks use EngineArgs (#9529)
|
2024-10-22 15:40:38 -07:00 |
|
Aurick Qiao
|
23b899a8e6
|
[Bugfix] fix detokenizer shallow copy (#5919)
|
2024-10-22 15:38:12 -07:00 |
|
youkaichao
|
17c79f3c36
|
[torch.compile] auto infer dynamic_arg_dims from type annotation (#9589)
|
2024-10-22 13:43:37 -07:00 |
|
Ronen Schaffer
|
cd5601ac37
|
[BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017)
|
2024-10-22 11:11:53 -07:00 |
|
Yuhong Guo
|
434984e665
|
[Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
|
2024-10-22 18:07:30 +00:00 |
|
Yuan
|
32a1ee74a0
|
[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
|
2024-10-22 10:38:04 -07:00 |
|
gopalsarda
|
08075c3448
|
[Bugfix] Eagle: change config name for fc bias (#9580)
|
2024-10-22 16:14:22 +00:00 |
|
Isotr0py
|
bb392ea2d2
|
[Model][VLM] Initialize support for Mono-InternVL model (#9528)
|
2024-10-22 16:01:46 +00:00 |
|
xendo
|
9dbcce84a7
|
[Neuron] [Bugfix] Fix neuron startup (#9374)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-22 12:51:41 +00:00 |
|
Jee Jee Li
|
a48e3ec052
|
[CI/Build][LoRA] Temporarily fix long context failure issue (#9579)
|
2024-10-22 11:32:51 +00:00 |
|
Woosuk Kwon
|
6c5af09b39
|
[V1] Implement vLLM V1 [1/N] (#9289)
|
2024-10-22 01:24:07 -07:00 |
|
wangshuai09
|
3ddbe25502
|
[Hardware][CPU] using current_platform.is_cpu (#9536)
|
2024-10-22 00:50:43 -07:00 |
|
chenqianfzh
|
0d02747f2e
|
support TP in qwen2 bnb (#9574)
|
2024-10-22 07:13:23 +00:00 |
|
Rafael Vasquez
|
f7db5f0fa9
|
[Doc] Use shell code-blocks and fix section headers (#9508)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-10-22 06:43:24 +00:00 |
|
Kuntai Du
|
ca30c3c84b
|
[Core] Remove evictor_v1 (#9572)
|
2024-10-22 04:55:49 +00:00 |
|
Wallas Henrique
|
c0292211ce
|
[CI/Build] Replaced some models on tests for smaller ones (#9570)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-22 04:52:14 +00:00 |
|
Falko1
|
74692421f7
|
[Bugfix]: phi.py get rope_theta from config file (#9503)
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-10-22 02:53:36 +00:00 |
|
ngrozae
|
29acd2c34c
|
[Bugfix][OpenVINO] fix_dockerfile_openvino (#9552)
|
2024-10-21 19:47:52 -07:00 |
|
Cyrus Leung
|
f085995a7b
|
[CI/Build] Remove unnecessary fork_new_process (#9484)
|
2024-10-21 19:47:29 -07:00 |
|
Travis Johnson
|
b729901139
|
[Bugfix]: serialize config by value for --trust-remote-code (#6751)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-21 19:46:24 -07:00 |
|
youkaichao
|
76a5e13270
|
[core] move parallel sampling out from vllm core (#9302)
|
2024-10-22 00:31:44 +00:00 |
|
Joe Runde
|
ef7faad1b8
|
🐛 Fixup more test failures from memory profiling (#9563)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-21 17:10:56 -07:00 |
|
Kuntai Du
|
575dcebe9a
|
[CI] Make format checker error message more user-friendly by using emoji (#9564)
This PR makes format checker error message more user-friendly by adding emojis.
|
2024-10-21 23:45:15 +00:00 |
|
Wallas Henrique
|
711f3a7806
|
[Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-21 14:49:41 -07:00 |
|
Nick Hill
|
15713e3b75
|
[BugFix] Update draft model TP size check to allow matching target TP size (#9394)
Co-authored-by: Baoyuan Qi <qibaoyuan@126.com>
|
2024-10-21 14:14:29 -07:00 |
|
youkaichao
|
d621c43df7
|
[doc] fix format (#9562)
|
2024-10-21 13:54:57 -07:00 |
|
Nick Hill
|
9d9186be97
|
[Frontend] Reduce frequency of client cancellation checking (#7959)
|
2024-10-21 13:28:10 -07:00 |
|
Michael Goin
|
5241aa1494
|
[Model][Bugfix] Fix batching with multi-image in PixtralHF (#9518)
|
2024-10-21 14:20:07 -04:00 |
|
Varad Ahirwadkar
|
ec6bd6c4c6
|
[BugFix] Use correct python3 binary in Docker.ppc64le entrypoint (#9492)
Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>
|
2024-10-21 17:43:02 +00:00 |
|
yudian0504
|
8ca8954841
|
[Bugfix][Misc]: fix graph capture for decoder (#9549)
|
2024-10-21 17:33:30 +00:00 |
|
Dhia Eddine Rhaiem
|
f6b97293aa
|
[Model] FalconMamba Support (#9325)
|
2024-10-21 12:50:16 -04:00 |
|
Thomas Parnell
|
496e991da8
|
[Doc] Consistent naming of attention backends (#9498)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-10-21 22:29:57 +08:00 |
|
Cyrus Leung
|
696b01af8f
|
[CI/Build] Split up decoder-only LM tests (#9488)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-20 21:27:50 -07:00 |
|
Andy Dai
|
855e0e6f97
|
[Frontend][Misc] Goodput metric support (#9338)
|
2024-10-20 18:39:32 +00:00 |
|
Chen Zhang
|
4fa3e33349
|
[Kernel] Support sliding window in flash attention backend (#9403)
|
2024-10-20 10:57:52 -07:00 |
|
Michael Goin
|
962d2c6349
|
[Model][Pixtral] Use memory_efficient_attention for PixtralHFVision (#9520)
|
2024-10-20 05:29:14 +00:00 |
|
Chen Zhang
|
5b59fe0f08
|
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530)
|
2024-10-20 00:05:02 +00:00 |
|
Michael Goin
|
8e3e7f2713
|
[Model][Pixtral] Optimizations for input_processor_for_pixtral_hf (#9514)
|
2024-10-19 10:44:29 -04:00 |
|
Cyrus Leung
|
263d8ee150
|
[Bugfix] Fix missing task for speculative decoding (#9524)
|
2024-10-19 06:49:40 +00:00 |
|
Yue Zhang
|
c5eea3c8ba
|
[Frontend] Support simpler image input format (#9478)
|
2024-10-18 23:17:07 -07:00 |
|
Russell Bryant
|
85dc92fc98
|
[CI/Build] Configure matcher for actionlint workflow (#9511)
Signed-off-by: Russell Bryant <russell.bryant@gmail.com>
|
2024-10-19 06:04:18 +00:00 |
|
Russell Bryant
|
dfd951ed9b
|
[CI/Build] Add error matching for ruff output (#9513)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-19 05:42:20 +00:00 |
|
Joe Runde
|
82c25151ec
|
[Doc] update gpu-memory-utilization flag docs (#9507)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-19 11:26:36 +08:00 |
|
Nick Hill
|
1325872ec8
|
[Frontend] Avoid creating guided decoding LogitsProcessor unnecessarily (#9521)
|
2024-10-18 20:21:01 -07:00 |
|
Joe Runde
|
380e18639f
|
🐛 fix torch memory profiling (#9516)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-18 21:25:19 -04:00 |
|
sasha0552
|
337ed76671
|
[Bugfix] Fix offline mode when using mistral_common (#9457)
|
2024-10-18 18:12:32 -07:00 |
|
Thomas Parnell
|
0c9a5258f9
|
[Kernel] Add env variable to force flashinfer backend to enable tensor cores (#9497)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-10-18 17:55:48 -07:00 |
|
Cody Yu
|
d11bf435a0
|
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510)
|
2024-10-18 14:30:55 -07:00 |
|