2360 Commits

Author SHA1 Message Date
Cyrus Leung
18572e3384
[Bugfix] Fix HfExampleModels.find_hf_info (#12223)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:35:36 +00:00
Cyrus Leung
b37d82791e
[Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 17:58:48 +08:00
Cyrus Leung
59a0192fb9
[Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:00:59 +08:00
Isotr0py
83609791d2
[Model] Add Qwen2 PRM model support (#12202)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-20 14:59:46 +08:00
Martin Gleize
bbe5f9de7d
[Model] Support for fairseq2 Llama (#11442)
Signed-off-by: Martin Gleize <mgleize@meta.com>
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>
2025-01-19 10:40:40 -08:00
Roger Wang
81763c58a0
[V1] Add V1 support of Qwen2-VL (#12128)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-19 19:52:13 +08:00
yancong
32eb0da808
[Misc] Support register quantization method out-of-tree (#11969) 2025-01-18 16:13:16 -08:00
Isotr0py
02798ecabe
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-18 13:59:39 +08:00
youkaichao
da02cb4b27
[core] further polish memory profiling (#12126)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-18 12:25:08 +08:00
Wallas Henrique
58fd57ff1d
[Bugfix] Fix score api for missing max_model_len validation (#12119)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2025-01-17 16:24:22 +00:00
youkaichao
87a0c076af
[core] allow callable in collective_rpc (#12151)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-17 20:47:01 +08:00
Jee Jee Li
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages (#12102)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-17 19:32:28 +08:00
Chen Zhang
69d765f5a5
[V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-01-17 07:39:35 +00:00
Isotr0py
d75ab55f10
[Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-17 06:34:48 +00:00
Isotr0py
62b06ba23d
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-16 17:14:48 +00:00
Roger Wang
874f7c292a
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-16 14:54:06 +00:00
youkaichao
bf53e0c70b
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-16 19:58:53 +08:00
Isotr0py
dd7c9ad870
[Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-16 10:11:54 +00:00
Joe Runde
edce722eaa
[Bugfix] use right truncation for non-generative tasks (#12050)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-16 00:31:01 +08:00
kewang-xlnx
de0526f668
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-15 11:05:15 -05:00
RunningLeon
97eb97b5a4
[Model]: Support internlm3 (#12037) 2025-01-15 11:35:17 +00:00
wangxiyuan
3adf0ffda8
[Platform] Do not raise error if _Backend is not found (#12023)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-01-15 10:14:15 +00:00
Chen Zhang
994fc655b7
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003) 2025-01-15 07:55:30 +00:00
youkaichao
ad34c0df0f
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-15 13:45:21 +08:00
Elfie Guo
0794e7446e
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467) 2025-01-15 12:47:49 +08:00
Jee Jee Li
42f5e7c52a
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-15 02:29:53 +00:00
Cyrus Leung
bb354e6b2d
[Bugfix] Fix various bugs in multi-modal processor (#12031)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-14 12:16:11 +00:00
Yangcheng Li
f7b3ba82c3
[MISC] fix typo in kv transfer send recv test (#11983) 2025-01-13 05:07:48 +00:00
Robert Shaw
619ae268c3
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-13 04:54:10 +00:00
Isotr0py
d14e98d924
[Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
Robert Shaw
9597a095f2
[V1][Core][1/n] Logging and Metrics (#11962)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-12 21:02:02 +00:00
Avshalom Manevich
263a870ee1
[Hardware][TPU] workaround fix for MoE on TPU (#11764) 2025-01-12 10:53:51 -05:00
Akshat Tripathi
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
Isotr0py
f967e51f38
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00
Nicolò Lucchesi
d697dc01b4
[Bugfix] Fix RobertaModel loading (#11940)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-11 14:05:09 +00:00
Cyrus Leung
a991f7d508
[Doc] Basic guide for writing unit tests for new models (#11951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 21:27:24 +08:00
Cyrus Leung
7a3a83e3b8
[CI/Build] Move model-specific multi-modal processing tests (#11934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 13:50:05 +08:00
youkaichao
899136b857
[ci] fix broken distributed-tests-4-gpus (#11937)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-11 09:07:24 +08:00
Li, Jiang
aa1e77a19c
[Hardware][CPU] Support MOE models on x86 CPU (#11831)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-10 11:07:58 -05:00
Harry Mellor
482cdc494e
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 23:50:29 +08:00
youkaichao
241ad7b301
[ci] Fix sampler tests (#11922)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-10 20:45:33 +08:00
Harry Mellor
d85c47d6ad
Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 12:05:56 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Chen Zhang
cf5f000d21
[torch.compile] Hide KV cache behind torch.compile boundary (#11677)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-10 13:14:42 +08:00
Cyrus Leung
b844b99ad3
[VLM] Enable tokenized inputs for merged multi-modal processor (#11900)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-10 03:24:00 +00:00
Cyrus Leung
9a228348d2
[Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 10:19:37 -07:00
youkaichao
bd82872211
[ci]try to fix flaky multi-step tests (#11894)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-09 14:47:29 +00:00
wangxiyuan
405eb8e396
[platform] Allow platform specify attention backend (#11609)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-01-09 21:46:50 +08:00
Cyrus Leung
0bd1ff4346
[Bugfix] Override dunder methods of placeholder modules (#11882)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 09:02:53 +00:00
Maximilien de Bayser
1fe554bac3
treat do_lower_case in the same way as the sentence-transformers library (#11815)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-09 11:05:43 +08:00