Isotr0py
f57092c00b
[Doc] Add oneDNN installation to CPU backend documentation ( #8467 )
2024-09-13 18:06:30 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
youkaichao
0a4806f0a9
[plugin][torch.compile] allow to add custom compile backend ( #8445 )
2024-09-13 09:32:42 -07:00
Cyrus Leung
ecd7a1d5b6
[Installation] Gate FastAPI version for Python 3.8 ( #8456 )
2024-09-13 09:02:26 -07:00
youkaichao
a2469127db
[misc][ci] fix quant test ( #8449 )
2024-09-13 17:20:14 +08:00
Jee Jee Li
06311e2956
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 ( #8442 )
2024-09-13 07:58:28 +00:00
youkaichao
cab69a15e4
[doc] recommend pip instead of conda ( #8446 )
2024-09-12 23:52:41 -07:00
Isotr0py
9b4a3b235e
[CI/Build] Enable InternVL2 PP test only on single node ( #8437 )
2024-09-13 06:35:20 +00:00
Simon Mo
acda0b35d0
bump version to v0.6.1.post1 ( #8440 )
v0.6.1.post1
2024-09-12 21:39:49 -07:00
William Lin
ba77527955
[bugfix] torch profiler bug for single gpu with GPUExecutor ( #8354 )
2024-09-12 21:30:00 -07:00
Alexander Matveev
6821020109
[Bugfix] Fix async log stats ( #8417 )
2024-09-12 20:48:59 -07:00
Cyrus Leung
8427550488
[CI/Build] Update pixtral tests to use JSON ( #8436 )
2024-09-13 03:47:52 +00:00
Cyrus Leung
3f79bc3d1a
[Bugfix] Bump fastapi and pydantic version ( #8435 )
2024-09-13 03:21:42 +00:00
shangmingc
40c396533d
[Bugfix] Mapping physical device indices for e2e test utils ( #8290 )
2024-09-13 11:06:28 +08:00
Cyrus Leung
5ec9c0fb3c
[Core] Factor out input preprocessing to a separate class ( #7329 )
2024-09-13 02:56:13 +00:00
Dipika Sikka
8f44a92d85
[BugFix] fix group_topk ( #8430 )
2024-09-13 09:23:42 +08:00
Roger Wang
360ddbd37e
[Misc] Update Pixtral example ( #8431 )
2024-09-12 17:31:18 -07:00
Wenxiang
a480939e8e
[Bugfix] Fix weight loading issue by rename variable. ( #8293 )
2024-09-12 19:25:00 -04:00
Patrick von Platen
d31174a4e1
[Hotfix][Pixtral] Fix multiple images bugs ( #8415 )
2024-09-12 15:21:51 -07:00
Roger Wang
b61bd98f90
[CI/Build] Disable multi-node test for InternVL2 ( #8428 )
2024-09-12 15:05:35 -07:00
Roger Wang
c16369455f
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models ( #8425 )
2024-09-12 14:06:51 -07:00
Alexander Matveev
019877253b
[Bugfix] multi-step + flashinfer: ensure cuda graph compatible ( #8427 )
2024-09-12 21:01:50 +00:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output ( #7381 )
2024-09-12 12:02:00 -07:00
William Lin
a6c0f3658d
[multi-step] add flashinfer backend ( #7928 )
2024-09-12 11:16:22 -07:00
Joe Runde
f2e263b801
[Bugfix] Offline mode fix ( #8376 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00
Luis Vega
1f0c75afa9
[BugFix] Fix Duplicate Assignment in Hermes2ProToolParser ( #8423 )
2024-09-12 11:10:11 -07:00
WANGWEI
8a23e93302
[BugFix] lazy init _copy_stream to avoid torch init wrong gpu instance ( #8403 )
2024-09-12 10:47:42 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Isotr0py
e56bf27741
[Bugfix] Fix InternVL2 inference with various num_patches ( #8375 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:35 -07:00
Roger Wang
520ca380ae
[Hotfix][VLM] Fixing max position embeddings for Pixtral ( #8399 )
2024-09-12 09:28:37 -07:00
youkaichao
7de49aa86c
[torch.compile] hide slicing under custom op for inductor ( #8384 )
2024-09-12 00:11:55 -07:00
Woosuk Kwon
42ffba11ad
[Misc] Use RoPE cache for MRoPE ( #8396 )
2024-09-11 23:13:14 -07:00
Kevin Lin
295c4730a8
[Misc] Raise error when using encoder/decoder model with cpu backend ( #8355 )
2024-09-12 05:45:24 +00:00
Blueyo0
1bf2dd9df0
[Gemma2] add bitsandbytes support for Gemma2 ( #8338 )
2024-09-11 21:53:12 -07:00
tomeras91
5a60699c45
[Bugfix]: Fix the logic for deciding if tool parsing is used ( #8366 )
2024-09-12 03:55:30 +00:00
Michael Goin
b6c75e1cf2
Fix the AMD weight loading tests ( #8390 )
2024-09-11 20:35:33 -07:00
Woosuk Kwon
b71c956deb
[TPU] Use Ray for default distributed backend ( #8389 )
2024-09-11 20:31:51 -07:00
youkaichao
f842a7aff1
[misc] remove engine_use_ray ( #8126 )
2024-09-11 18:23:36 -07:00
Cody Yu
a65cb16067
[MISC] Dump model runner inputs when crashing ( #8305 )
2024-09-12 01:12:25 +00:00
Simon Mo
3fd2b0d21c
Bump version to v0.6.1 ( #8379 )
v0.6.1
2024-09-11 14:42:11 -07:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Lily Liu
775f00f81e
[Speculative Decoding] Test refactor ( #8317 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-11 14:07:34 -07:00
Aarni Koskela
8baa454937
[Misc] Move device options to a single place ( #8322 )
2024-09-11 13:25:58 -07:00
bnellnm
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks ( #6917 )
...
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Cyrus Leung
7015417fd4
[Bugfix] Add missing attributes in mistral tokenizer ( #8364 )
2024-09-11 11:36:54 -07:00
Alexey Kondratiev(AMD)
aea02f30de
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation ( #8373 )
2024-09-11 18:31:41 +00:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend ( #7257 )
2024-09-11 09:46:46 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Pooya Davoodi
cea95dfb94
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch ( #8347 )
2024-09-11 05:30:11 +00:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model ( #7559 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00