1788 Commits

Author SHA1 Message Date
Dipika Sikka
60508ffda9
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Wallas Henrique
8b79f9e107
[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046) 2024-12-17 22:34:08 -08:00
Cody Yu
bf8717ebae
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-12-17 16:37:59 -08:00
Michael Goin
c77eb8a33c
[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264) 2024-12-17 16:34:06 -08:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling (#11190) 2024-12-17 12:26:32 -08:00
kYLe
66d4b16724
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-16 22:09:58 -08:00
Michael Goin
0064f697d3
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-17 11:39:58 +08:00
youkaichao
551603feff
[core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-16 13:32:25 -08:00
Isotr0py
d927dbcd88
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-16 10:09:53 +00:00
Jani Monoses
bddbbcb132
[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203) 2024-12-16 09:56:19 +00:00
Cyrus Leung
b10609e6a1
[Misc] Clean up multi-modal processor (#11207)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-15 06:30:28 +00:00
Cyrus Leung
93abf23a64
[VLM] Fully dynamic prompt replacement in merged input processor (#11199)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 17:52:18 +00:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
2024-12-14 16:46:42 +00:00
Cyrus Leung
0920ab9131
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00
Sungjae Lee
c31d4a57a6
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240) 2024-12-13 07:51:25 -08:00
Cyrus Leung
eeec9e3390
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-13 10:40:07 +00:00
youkaichao
be39e3cd18
[core] clean up cudagraph batchsize padding logic (#10996)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-13 06:57:50 +00:00
Pooya Davoodi
1efce68605
[Bugfix] Use runner_type instead of task in GritLM (#11144)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2024-12-13 04:09:53 +00:00
Luka Govedič
30870b4f66
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-13 03:19:23 +00:00
Cody Yu
78ed8f57d8
[Misc][V1] Fix type in v1 prefix caching (#11151) 2024-12-13 00:57:40 +00:00
Jiaxin Shan
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-12 09:25:16 +00:00
youkaichao
62de37a38e
[core][distributed] initialization from StatelessProcessGroup (#10986)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-12 09:04:19 +00:00
Pooya Davoodi
1da8f0e1dd
[Model] Add support for embedding model GritLM (#10816)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2024-12-12 06:39:16 +00:00
Alexander Matveev
4e11683368
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-12 00:55:30 +00:00
Cyrus Leung
d1e21a979b
[CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-12 06:18:16 +08:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Cyrus Leung
2e33fe4191
[CI/Build] Check transformers v4.47 (#10991)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 05:02:02 +00:00
Mor Zusman
ffa48c9146
[Model] PP support for Mamba-like models (#10992)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-12-10 21:53:37 -05:00
Aurick Qiao
d5c5154fcf
[Misc] LoRA + Chunked Prefill (#9057) 2024-12-11 10:09:20 +08:00
Jee Jee Li
d05f88679b
[Misc][LoRA] Add PEFTHelper for LoRA (#11003)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-10 11:12:01 +00:00
youkaichao
ebf778061d
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-09 22:35:36 -08:00
Tyler Michael Smith
28b3a1c7e5
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-10 06:28:14 +00:00
Isotr0py
a811dd6608
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-09 12:55:10 -08:00
Jee Jee Li
ca871491ed
[Misc][LoRA] Abstract PunicaWrapper (#10955)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-09 12:54:44 -08:00
youkaichao
fd57d2b534
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-08 11:05:21 +00:00
zhou fan
78029b34ed
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928)
Signed-off-by: xffxff <1247714429@qq.com>
2024-12-08 01:21:18 +08:00
Cyrus Leung
c889d5888b
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:20:49 +00:00
Cyrus Leung
39e227c7ae
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:10:05 +00:00
Cyrus Leung
955fa9533a
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-07 00:50:58 -08:00
Cyrus Leung
222f5b082a
[CI/Build] Fix broken multimodal test (#10950) 2024-12-06 10:41:23 +00:00
youkaichao
9743d64e4e
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-05 08:54:47 -08:00
Isotr0py
998eeafe58
[CI/Build] Bump test transformers version (#10106)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-05 16:05:52 +00:00
Jee Jee Li
571da8fc43
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-05 13:22:28 +00:00
Michael Goin
8d370e91cb
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-05 11:14:06 +08:00
Xin Yang
01d079fd8e
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
2024-12-04 17:40:16 +00:00
Tyler Michael Smith
d2bd88b122
[CI/Build] Replace mean with torch.all in test_pynccl.py (#10876)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-04 03:23:21 +00:00
Alexander Matveev
3bc94cab69
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-03 10:33:10 +00:00
Aaron Pham
9323a3153b
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-12-03 15:17:00 +08:00
youkaichao
dc5ce861bf
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-03 06:19:02 +00:00
youkaichao
21fe7b481a
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-03 04:53:23 +00:00