Isotr0py
|
d75ab55f10
|
[Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-17 06:34:48 +00:00 |
|
Chen Zhang
|
d1adb9b403
|
[BugFix] add more is not None check in VllmConfig.__post_init__ (#12138)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-17 05:33:22 +00:00 |
|
Yuan Tang
|
b8bfa46a18
|
[Bugfix] Fix issues in CPU build Dockerfile (#12135)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-01-17 12:54:01 +08:00 |
|
Yuan Tang
|
1475847a14
|
[Doc] Add instructions on using Podman when SELinux is active (#12136)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-01-17 04:45:36 +00:00 |
|
Kunshang Ji
|
fead53ba78
|
[CI]add genai-perf benchmark in nightly benchmark (#10704)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-01-17 04:15:09 +00:00 |
|
Kuntai Du
|
ebc73f2828
|
[Bugfix] Fix a path bug in disaggregated prefill example script. (#12121)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-01-17 11:12:41 +08:00 |
|
Chen Zhang
|
d06e824006
|
[Bugfix] Set enforce_eager automatically for mllama (#12127)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-16 15:30:08 -05:00 |
|
Isotr0py
|
62b06ba23d
|
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 17:14:48 +00:00 |
|
Varun Sundar Rabindranath
|
5fd24ec02e
|
[misc] Add LoRA kernel micro benchmarks (#11579)
|
2025-01-16 15:51:40 +00:00 |
|
Roger Wang
|
874f7c292a
|
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-16 14:54:06 +00:00 |
|
youkaichao
|
92e793d91a
|
[core] LLM.collective_rpc interface and RLHF example (#12084)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 20:19:52 +08:00 |
|
youkaichao
|
bf53e0c70b
|
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 19:58:53 +08:00 |
|
Isotr0py
|
dd7c9ad870
|
[Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 10:11:54 +00:00 |
|
Michael Goin
|
9aa1519f08
|
Various cosmetic/comment fixes (#12089)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 09:59:06 +00:00 |
|
Cyrus Leung
|
f8ef146f03
|
[Doc] Add documentation for specifying model architecture (#12105)
|
2025-01-16 15:53:43 +08:00 |
|
Elfie Guo
|
fa0050db08
|
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 04:31:27 +00:00 |
|
tvirolai-amd
|
cd9d06fb8d
|
Allow hip sources to be directly included when compiling for rocm. (#12087)
|
2025-01-15 16:46:03 -05:00 |
|
Varun Sundar Rabindranath
|
ebd8c669ef
|
[Bugfix] Fix _get_lora_device for HQQ marlin (#12090)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-01-15 19:59:42 +00:00 |
|
Roger Wang
|
70755e819e
|
[V1][Core] Autotune encoder cache budget (#11895)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-15 11:29:00 -08:00 |
|
Joe Runde
|
edce722eaa
|
[Bugfix] use right truncation for non-generative tasks (#12050)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-16 00:31:01 +08:00 |
|
maang-h
|
57e729e874
|
[Doc]: Update OpenAI-Compatible Server documents (#12082)
|
2025-01-15 16:07:45 +00:00 |
|
kewang-xlnx
|
de0526f668
|
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-15 11:05:15 -05:00 |
|
Yuan
|
5ecf3e0aaf
|
Misc: allow to use proxy in HTTPConnection (#12042)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2025-01-15 13:16:40 +00:00 |
|
RunningLeon
|
97eb97b5a4
|
[Model]: Support internlm3 (#12037)
|
2025-01-15 11:35:17 +00:00 |
|
wangxiyuan
|
3adf0ffda8
|
[Platform] Do not raise error if _Backend is not found (#12023)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-15 10:14:15 +00:00 |
|
Keyun Tong
|
ad388d25a8
|
Type-fix: make execute_model output type optional (#12020)
|
2025-01-15 09:44:56 +00:00 |
|
Rahul Tuli
|
cbe94391eb
|
Fix: cases with empty sparsity config (#12057)
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
|
2025-01-15 17:41:24 +08:00 |
|
Chen Zhang
|
994fc655b7
|
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003)
|
2025-01-15 07:55:30 +00:00 |
|
Kyle Sayers
|
3f9b7ab9f5
|
[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-01-15 06:36:01 +00:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
Rui Qiao
|
f218f9c24d
|
[core] Turn off GPU communication overlap for Ray executor (#12051)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-01-15 05:19:55 +00:00 |
|
Elfie Guo
|
0794e7446e
|
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467)
|
2025-01-15 12:47:49 +08:00 |
|
Woosuk Kwon
|
b7ee940a82
|
[V1][BugFix] Fix edge case in VLM scheduling (#12065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-14 20:21:28 -08:00 |
|
Shanshan Shen
|
9ddac56311
|
[Platform] move current_memory_usage() into platform (#11369)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-15 03:38:25 +00:00 |
|
Konrad Zawora
|
1a51b9f872
|
[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-15 02:59:18 +00:00 |
|
Jee Jee Li
|
42f5e7c52a
|
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 02:29:53 +00:00 |
|
Jee Jee Li
|
a3a3ee4e6f
|
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 07:49:49 +08:00 |
|
maang-h
|
87054a57ab
|
[Doc]: Update the Json Example of the Engine Arguments document (#12045)
|
2025-01-14 17:03:04 +00:00 |
|
Harry Mellor
|
c9d6ff530b
|
Explain where the engine args go when using Docker (#12041)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-14 16:05:50 +00:00 |
|
Chen Zhang
|
a2d2acb4c8
|
[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-14 15:45:05 +00:00 |
|
wangxiyuan
|
2e0e017610
|
[Platform] Add output for Attention Backend (#11981)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-01-14 13:27:04 +00:00 |
|
Chen Zhang
|
1f18adb245
|
[Kernel] Revert the API change of Attention.forward (#12038)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-14 20:59:32 +08:00 |
|
Cyrus Leung
|
bb354e6b2d
|
[Bugfix] Fix various bugs in multi-modal processor (#12031)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-14 12:16:11 +00:00 |
|
youkaichao
|
ff39141a49
|
[HPU][misc] add comments for explanation (#12034)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-14 19:24:06 +08:00 |
|
TJian
|
8a1f938e6f
|
[Doc] Update Quantization Hardware Support Documentation (#12025)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-01-14 04:37:52 +00:00 |
|
Konrad Zawora
|
078da31903
|
[HPU][Bugfix] set_forward_context and CI test execution (#12014)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-14 11:04:18 +08:00 |
|
Woosuk Kwon
|
1a401252b5
|
[Docs] Add Sky Computing Lab to project intro (#12019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-13 17:24:36 -08:00 |
|
Steve Luo
|
f35ec461fc
|
[Bugfix] Fix deepseekv3 gate bias error (#12002)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-13 13:43:51 -07:00 |
|
Yikun Jiang
|
289b5191d5
|
[Doc] Fix build from source and installation link in README.md (#12013)
Signed-off-by: Yikun <yikunkero@gmail.com>
|
2025-01-13 17:23:59 +00:00 |
|
elijah
|
c6db21313c
|
bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982)
Signed-off-by: elijah <f1renze.142857@gmail.com>
|
2025-01-13 15:22:07 +00:00 |
|