diff --git a/docs/source/assets/kernel/v_vec.png b/docs/source/assets/kernel/v_vec.png index bac3c10949f6c..75d344ab933f2 100644 Binary files a/docs/source/assets/kernel/v_vec.png and b/docs/source/assets/kernel/v_vec.png differ diff --git a/docs/source/assets/kernel/value.png b/docs/source/assets/kernel/value.png index f585c77b2e144..56b0b9e0f56df 100644 Binary files a/docs/source/assets/kernel/value.png and b/docs/source/assets/kernel/value.png differ diff --git a/docs/source/dev/kernel/paged_attention.rst b/docs/source/dev/kernel/paged_attention.rst index 6fcadeeec27b6..ba4f7a2718158 100644 --- a/docs/source/dev/kernel/paged_attention.rst +++ b/docs/source/dev/kernel/paged_attention.rst @@ -447,7 +447,7 @@ Value a whole block of value tokens. And each ``accs`` in each thread contains 8 elements that accumulated at 8 different head positions. For the thread 0, the ``accs`` variable will have 8 elements, which - are 0th, 16th … 112th elements of a value head that are accumulated + are 0th, 32th … 224th elements of a value head that are accumulated from all assigned 8 tokens. LV