diff --git a/README.md b/README.md index e94a77f..f16f838 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,7 @@ 9. [Contact](#9-contact) -## 1. Introduction +1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.