Alex Brooks
|
722d46edb9
|
[Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-24 10:42:24 -07:00 |
|
Cyrus Leung
|
c866e0079d
|
[CI/Build] Fix VLM test failures when using transformers v4.46 (#9666)
|
2024-10-25 01:40:40 +08:00 |
|
Yongzao
|
d27cfbf791
|
[torch.compile] Adding torch compile annotations to some models (#9641)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-24 09:31:42 -07:00 |
|
Harry Mellor
|
de662d32b5
|
Increase operation per run limit for "Close inactive issues and PRs" workflow (#9661)
Signed-off-by: Harry Mellor <hej.mellor@gmail.com>
|
2024-10-24 12:17:45 -04:00 |
|
litianjian
|
f58454968f
|
[Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models (#9653)
|
2024-10-24 07:52:07 -07:00 |
|
Cyrus Leung
|
b979143d5b
|
[Doc] Move additional tips/notes to the top (#9647)
|
2024-10-24 09:43:59 +00:00 |
|
Yongzao
|
ad6f78053e
|
[torch.compile] expanding support and fix allgather compilation (#9637)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-24 01:32:15 -07:00 |
|
Jee Jee Li
|
295a061fb3
|
[Kernel] add kernel for FATReLU (#9610)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-10-24 16:18:27 +08:00 |
|
Yongzao
|
8a02cd045a
|
[torch.compile] Adding torch compile annotations to some models (#9639)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-24 00:54:57 -07:00 |
|
youkaichao
|
4fdc581f9e
|
[core] simplify seq group code (#9569)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-10-24 00:16:44 -07:00 |
|
Woosuk Kwon
|
3770071eb4
|
[V1][Bugfix] Clean up requests when aborted (#9629)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-10-23 23:33:22 -07:00 |
|
Cyrus Leung
|
836e8ef6ee
|
[Bugfix] Fix PP for ChatGLM and Molmo (#9422)
|
2024-10-24 06:12:05 +00:00 |
|
Yan Ma
|
056a68c7db
|
[XPU] avoid triton import for xpu (#9440)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-24 05:14:00 +00:00 |
|
Vinay R Damodaran
|
33bab41060
|
[Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
|
2024-10-24 05:05:49 +00:00 |
|
Michael Goin
|
b7df53cd42
|
[Bugfix] Use "vision_model" prefix for MllamaVisionModel (#9628)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-10-24 10:07:44 +08:00 |
|
Michael Goin
|
bb01f2915e
|
[Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-10-24 10:03:44 +08:00 |
|
Russell Bryant
|
b548d7a5f4
|
[CI/Build] Add bot to close stale issues and PRs (#9436)
|
2024-10-23 15:45:26 -07:00 |
|
Yunfei Chu
|
fc6c274626
|
[Model] Add Qwen2-Audio model support (#9248)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-23 17:54:22 +00:00 |
|
Alex Brooks
|
150b779081
|
[Frontend] Enable Online Multi-image Support for MLlama (#9393)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-23 17:28:57 +00:00 |
|
Yongzao
|
9013e24f7b
|
[torch.compile] Adding torch compile annotations to some models (#9614)
|
2024-10-23 10:07:48 -07:00 |
|
Michael Goin
|
fd0e2cfdb2
|
[Misc] Separate total and output tokens in benchmark_throughput.py (#8914)
|
2024-10-23 16:47:20 +00:00 |
|
Tyler Michael Smith
|
e5ac6a4199
|
[Bugfix] Fix divide by zero when serving Mamba models (#9617)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-10-23 16:40:43 +00:00 |
|
youkaichao
|
dbdd3b5e5a
|
[misc] comment to avoid future confusion about baichuan (#9620)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-23 09:14:44 -07:00 |
|
Cyrus Leung
|
e7116c017c
|
[Bugfix] Fix _init_vision_model in NVLM_D model (#9611)
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-10-23 14:09:04 +00:00 |
|
Alex Brooks
|
31a08f5bd2
|
[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-23 14:05:18 +00:00 |
|
Cyrus Leung
|
c18e1a3418
|
[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args (#9217)
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-10-23 11:27:37 +00:00 |
|
Isotr0py
|
3ff57ebfca
|
[Model] Initialize Florence-2 language backbone support (#9555)
|
2024-10-23 10:42:47 +00:00 |
|
Mengqing Cao
|
2394962d70
|
[Hardware][XPU] using current_platform.is_xpu (#9605)
|
2024-10-23 08:28:21 +00:00 |
|
Luka Govedič
|
51c24c9736
|
[Build] Fix FetchContent multiple build issue (#9596)
Signed-off-by: luka <luka@neuralmagic.com>
|
2024-10-23 12:43:07 +08:00 |
|
Cyrus Leung
|
831540cf04
|
[Model] Support E5-V (#9576)
|
2024-10-23 11:35:29 +08:00 |
|
Flex Wang
|
29061ed9df
|
[Misc] Add an env var VLLM_LOGGING_PREFIX, if set, it will be prepend to all logging messages (#9590)
|
2024-10-23 11:17:28 +08:00 |
|
Chen Zhang
|
65050a40e6
|
[Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592)
|
2024-10-22 17:45:35 -07:00 |
|
Seth Kimmel
|
208cb34c81
|
[Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889)
Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>
|
2024-10-22 15:43:25 -07:00 |
|
yulei
|
b17046e298
|
[BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234)
|
2024-10-22 15:43:03 -07:00 |
|
Lucas Wilkinson
|
d1e8240875
|
[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487)
|
2024-10-22 15:41:13 -07:00 |
|
Jeremy Arnold
|
cb6fdaa0a0
|
[Misc] Make benchmarks use EngineArgs (#9529)
|
2024-10-22 15:40:38 -07:00 |
|
Aurick Qiao
|
23b899a8e6
|
[Bugfix] fix detokenizer shallow copy (#5919)
|
2024-10-22 15:38:12 -07:00 |
|
youkaichao
|
17c79f3c36
|
[torch.compile] auto infer dynamic_arg_dims from type annotation (#9589)
|
2024-10-22 13:43:37 -07:00 |
|
Ronen Schaffer
|
cd5601ac37
|
[BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017)
|
2024-10-22 11:11:53 -07:00 |
|
Yuhong Guo
|
434984e665
|
[Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
|
2024-10-22 18:07:30 +00:00 |
|
Yuan
|
32a1ee74a0
|
[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
|
2024-10-22 10:38:04 -07:00 |
|
gopalsarda
|
08075c3448
|
[Bugfix] Eagle: change config name for fc bias (#9580)
|
2024-10-22 16:14:22 +00:00 |
|
Isotr0py
|
bb392ea2d2
|
[Model][VLM] Initialize support for Mono-InternVL model (#9528)
|
2024-10-22 16:01:46 +00:00 |
|
xendo
|
9dbcce84a7
|
[Neuron] [Bugfix] Fix neuron startup (#9374)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-22 12:51:41 +00:00 |
|
Jee Jee Li
|
a48e3ec052
|
[CI/Build][LoRA] Temporarily fix long context failure issue (#9579)
|
2024-10-22 11:32:51 +00:00 |
|
Woosuk Kwon
|
6c5af09b39
|
[V1] Implement vLLM V1 [1/N] (#9289)
|
2024-10-22 01:24:07 -07:00 |
|
wangshuai09
|
3ddbe25502
|
[Hardware][CPU] using current_platform.is_cpu (#9536)
|
2024-10-22 00:50:43 -07:00 |
|
chenqianfzh
|
0d02747f2e
|
support TP in qwen2 bnb (#9574)
|
2024-10-22 07:13:23 +00:00 |
|
Rafael Vasquez
|
f7db5f0fa9
|
[Doc] Use shell code-blocks and fix section headers (#9508)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-10-22 06:43:24 +00:00 |
|
Kuntai Du
|
ca30c3c84b
|
[Core] Remove evictor_v1 (#9572)
|
2024-10-22 04:55:49 +00:00 |
|