rasmith
|
68c4421b6d
|
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-01-23 00:10:37 +00:00 |
|
Cody Yu
|
7206ce4ce1
|
[Core] Support reset_prefix_cache (#12284)
|
2025-01-22 18:52:27 +00:00 |
|
youkaichao
|
68ad4e3a8d
|
[Core] Support fully transparent sleep mode (#11743)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-22 14:39:32 +08:00 |
|
Kevin H. Luu
|
64ea24d0b3
|
[ci/lint] Add back default arg for pre-commit (#12279)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-01-22 01:15:27 +00:00 |
|
Cyrus Leung
|
df76e5af26
|
[VLM] Simplify post-processing of replacement info (#12269)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-21 16:48:13 -08:00 |
|
Adrian Cole
|
347eeebe3b
|
[Misc] Remove experimental dep from tracing.py (#12007)
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
|
2025-01-21 11:51:55 -08:00 |
|
Andy Lo
|
18fd4a8331
|
[Bugfix] Multi-sequence broken (#11898)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-01-21 11:51:35 -08:00 |
|
Ricky Xu
|
132a132100
|
[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2025-01-21 11:51:13 -08:00 |
|
Nicolò Lucchesi
|
5fe6bf29d6
|
[BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-01-21 12:23:14 +08:00 |
|
Cyrus Leung
|
18572e3384
|
[Bugfix] Fix HfExampleModels.find_hf_info (#12223)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 15:35:36 +00:00 |
|
Cyrus Leung
|
b37d82791e
|
[Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 17:58:48 +08:00 |
|
Cyrus Leung
|
59a0192fb9
|
[Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 15:00:59 +08:00 |
|
Isotr0py
|
83609791d2
|
[Model] Add Qwen2 PRM model support (#12202)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-20 14:59:46 +08:00 |
|
Martin Gleize
|
bbe5f9de7d
|
[Model] Support for fairseq2 Llama (#11442)
Signed-off-by: Martin Gleize <mgleize@meta.com>
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>
|
2025-01-19 10:40:40 -08:00 |
|
Roger Wang
|
81763c58a0
|
[V1] Add V1 support of Qwen2-VL (#12128)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-19 19:52:13 +08:00 |
|
yancong
|
32eb0da808
|
[Misc] Support register quantization method out-of-tree (#11969)
|
2025-01-18 16:13:16 -08:00 |
|
Isotr0py
|
02798ecabe
|
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-18 13:59:39 +08:00 |
|
youkaichao
|
da02cb4b27
|
[core] further polish memory profiling (#12126)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-18 12:25:08 +08:00 |
|
Wallas Henrique
|
58fd57ff1d
|
[Bugfix] Fix score api for missing max_model_len validation (#12119)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2025-01-17 16:24:22 +00:00 |
|
youkaichao
|
87a0c076af
|
[core] allow callable in collective_rpc (#12151)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-17 20:47:01 +08:00 |
|
Jee Jee Li
|
07934cc237
|
[Misc][LoRA] Improve the readability of LoRA error messages (#12102)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-17 19:32:28 +08:00 |
|
Chen Zhang
|
69d765f5a5
|
[V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-01-17 07:39:35 +00:00 |
|
Isotr0py
|
d75ab55f10
|
[Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-17 06:34:48 +00:00 |
|
Isotr0py
|
62b06ba23d
|
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 17:14:48 +00:00 |
|
Roger Wang
|
874f7c292a
|
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-16 14:54:06 +00:00 |
|
youkaichao
|
bf53e0c70b
|
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 19:58:53 +08:00 |
|
Isotr0py
|
dd7c9ad870
|
[Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 10:11:54 +00:00 |
|
Joe Runde
|
edce722eaa
|
[Bugfix] use right truncation for non-generative tasks (#12050)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-16 00:31:01 +08:00 |
|
kewang-xlnx
|
de0526f668
|
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-15 11:05:15 -05:00 |
|
RunningLeon
|
97eb97b5a4
|
[Model]: Support internlm3 (#12037)
|
2025-01-15 11:35:17 +00:00 |
|
wangxiyuan
|
3adf0ffda8
|
[Platform] Do not raise error if _Backend is not found (#12023)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-15 10:14:15 +00:00 |
|
Chen Zhang
|
994fc655b7
|
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003)
|
2025-01-15 07:55:30 +00:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
Elfie Guo
|
0794e7446e
|
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467)
|
2025-01-15 12:47:49 +08:00 |
|
Jee Jee Li
|
42f5e7c52a
|
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 02:29:53 +00:00 |
|
Cyrus Leung
|
bb354e6b2d
|
[Bugfix] Fix various bugs in multi-modal processor (#12031)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-14 12:16:11 +00:00 |
|
Yangcheng Li
|
f7b3ba82c3
|
[MISC] fix typo in kv transfer send recv test (#11983)
|
2025-01-13 05:07:48 +00:00 |
|
Robert Shaw
|
619ae268c3
|
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-13 04:54:10 +00:00 |
|
Isotr0py
|
d14e98d924
|
[Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-13 00:13:44 +00:00 |
|
Robert Shaw
|
9597a095f2
|
[V1][Core][1/n] Logging and Metrics (#11962)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-12 21:02:02 +00:00 |
|
Avshalom Manevich
|
263a870ee1
|
[Hardware][TPU] workaround fix for MoE on TPU (#11764)
|
2025-01-12 10:53:51 -05:00 |
|
Akshat Tripathi
|
8bddb73512
|
[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-12 13:01:52 +00:00 |
|
Isotr0py
|
f967e51f38
|
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-12 00:17:24 -08:00 |
|
Nicolò Lucchesi
|
d697dc01b4
|
[Bugfix] Fix RobertaModel loading (#11940)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-01-11 14:05:09 +00:00 |
|
Cyrus Leung
|
a991f7d508
|
[Doc] Basic guide for writing unit tests for new models (#11951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 21:27:24 +08:00 |
|
Cyrus Leung
|
7a3a83e3b8
|
[CI/Build] Move model-specific multi-modal processing tests (#11934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 13:50:05 +08:00 |
|
youkaichao
|
899136b857
|
[ci] fix broken distributed-tests-4-gpus (#11937)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-11 09:07:24 +08:00 |
|
Li, Jiang
|
aa1e77a19c
|
[Hardware][CPU] Support MOE models on x86 CPU (#11831)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-01-10 11:07:58 -05:00 |
|
Harry Mellor
|
482cdc494e
|
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 23:50:29 +08:00 |
|
youkaichao
|
241ad7b301
|
[ci] Fix sampler tests (#11922)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-10 20:45:33 +08:00 |
|