Rafael Vasquez
|
de24046fcd
|
[Doc] Improve contributing and installation documentation (#9132)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-10-08 20:22:08 +00:00 |
|
Sayak Paul
|
1874c6a1b0
|
[Doc] Update vlm.rst to include an example on videos (#9155)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-08 18:12:29 +00:00 |
|
Daniele
|
9a94ca4a5d
|
[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537)
|
2024-10-08 09:38:40 -07:00 |
|
Peter Pan
|
cfba685bd4
|
[CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2024-10-08 09:37:34 -07:00 |
|
Alex Brooks
|
069d3bd8d0
|
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:31:26 +00:00 |
|
Alex Brooks
|
a3691b6b5e
|
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:12:56 +00:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
youkaichao
|
e1faa2a598
|
[misc] improve ux on readme (#9147)
|
2024-10-07 22:26:25 -07:00 |
|
Kunshang Ji
|
80b57f00d5
|
[Intel GPU] Fix xpu decode input (#9145)
|
2024-10-08 03:51:14 +00:00 |
|
youkaichao
|
04c12f8157
|
[misc] update utils to support comparing multiple settings (#9140)
|
2024-10-08 02:51:49 +00:00 |
|
Simon Mo
|
8eeb857084
|
Add Slack to README (#9137)
|
2024-10-07 17:06:21 -07:00 |
|
youkaichao
|
fa45513a51
|
[misc] fix comment and variable name (#9139)
|
2024-10-07 16:07:05 -07:00 |
|
Kuntai Du
|
c0d9a98d0c
|
[Doc] Include performance benchmark in README (#9135)
|
2024-10-07 15:04:06 -07:00 |
|
Russell Bryant
|
e0dbdb013d
|
[CI/Build] Add linting for github actions workflows (#7876)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-07 21:18:10 +00:00 |
|
TimWang
|
93cf74a8a7
|
[Doc]: Add deploying_with_k8s guide (#8451)
|
2024-10-07 13:31:45 -07:00 |
|
Cyrus Leung
|
151ef4efd2
|
[Model] Support NVLM-D and fix QK Norm in InternViT (#9045)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2024-10-07 11:55:12 +00:00 |
|
Isotr0py
|
f19da64871
|
[Core] Refactor GGUF parameters packing and forwarding (#8859)
|
2024-10-07 10:01:46 +00:00 |
|
Isotr0py
|
4f95ffee6f
|
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)
|
2024-10-07 06:50:35 +00:00 |
|
Cyrus Leung
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
sroy745
|
c8f26bb636
|
[BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (#9103)
|
2024-10-07 03:52:42 +00:00 |
|
Isotr0py
|
487678d046
|
[Bugfix][Hardware][CPU] Fix CPU model input for decode (#9044)
|
2024-10-06 19:14:27 -07:00 |
|
Varun Sundar Rabindranath
|
cb3b2b9ba4
|
[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-10-06 12:48:11 -07:00 |
|
Yanyi Liu
|
fdf59d30ea
|
[Bugfix] fix tool_parser error handling when serve a model not support it (#8709)
|
2024-10-06 12:51:08 +00:00 |
|
Cyrus Leung
|
b22b798471
|
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-10-06 16:35:27 +08:00 |
|
Cyrus Leung
|
f22619fe96
|
[Misc] Remove user-facing error for removed VLM args (#9104)
|
2024-10-06 01:33:52 -07:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
TJian
|
23fea8714a
|
[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101)
|
2024-10-06 13:00:04 +08:00 |
|
youkaichao
|
f4dd830e09
|
[core] use forward context for flash infer (#9097)
|
2024-10-05 19:37:31 -07:00 |
|
Andy Dai
|
5df1834895
|
[Bugfix] Fix order of arguments matters in config.yaml (#8960)
|
2024-10-05 17:35:11 +00:00 |
|
Chen Zhang
|
cfadb9c687
|
[Bugfix] Deprecate registration of custom configs to huggingface (#9083)
|
2024-10-05 21:56:40 +08:00 |
|
Xin Yang
|
15986f598c
|
[Model] Support Gemma2 embedding model (#9004)
|
2024-10-05 06:57:05 +00:00 |
|
hhzhang16
|
53b3a33027
|
[Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979)
|
2024-10-04 22:05:37 -07:00 |
|
Chen Zhang
|
dac914b0d6
|
[Bugfix] use blockmanagerv1 for encoder-decoder (#9084)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-10-05 04:45:38 +00:00 |
|
Zhuohan Li
|
a95354a36e
|
[Doc] Update README.md with Ray summit slides (#9088)
|
2024-10-05 02:54:45 +00:00 |
|
youkaichao
|
663874e048
|
[torch.compile] improve allreduce registration (#9061)
|
2024-10-04 16:43:50 -07:00 |
|
Chongming Ni
|
cc90419e89
|
[Hardware][Neuron] Add on-device sampling support for Neuron (#8746)
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
|
2024-10-04 16:42:20 -07:00 |
|
Cody Yu
|
27302dd584
|
[Misc] Fix CI lint (#9085)
|
2024-10-04 16:07:54 -07:00 |
|
Andy Dai
|
0cc566ca8f
|
[Misc] Add random seed for prefix cache benchmark (#9081)
|
2024-10-04 21:58:57 +00:00 |
|
Andy Dai
|
05c531be47
|
[Misc] Improved prefix cache example (#9077)
|
2024-10-04 21:38:42 +00:00 |
|
Kuntai Du
|
fbb74420e7
|
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412)
|
2024-10-04 14:01:44 -07:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Flávia Béo
|
0dcc8cbe5a
|
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-10-04 18:31:40 +00:00 |
|
Roger Wang
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
Varad Ahirwadkar
|
e5dc713c23
|
[Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039)
Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>
|
2024-10-04 17:24:42 +00:00 |
|
Simon Mo
|
36eecfbddb
|
Remove AMD Ray Summit Banner (#9075)
|
2024-10-04 10:17:16 -07:00 |
|
Prashant Gupta
|
9ade8bbc8d
|
[Model] add a bunch of supported lora modules for mixtral (#9008)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-04 16:24:40 +00:00 |
|
Lucas Wilkinson
|
22482e495e
|
[Bugfix] Flash attention arches not getting set properly (#9062)
|
2024-10-04 09:43:15 -06:00 |
|
whyiug
|
3d826d2c52
|
[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071)
|
2024-10-04 14:34:58 +00:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|