mirror of
https://git.datalinker.icu/deepseek-ai/DeepSeek-R1.git
synced 2025-12-08 20:44:23 +08:00
Compare commits
9 Commits
24a51f9102
...
05299775a9
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
05299775a9 | ||
|
|
0cf78561f1 | ||
|
|
4a6d53cac8 | ||
|
|
f1e82facf1 | ||
|
|
1c10c9f677 | ||
|
|
bb10d07b27 | ||
|
|
6a023be7cf | ||
|
|
6e59fa73e6 | ||
|
|
c942e96852 |
12
README.md
12
README.md
@ -32,7 +32,7 @@
|
|||||||
|
|
||||||
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
|
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
|
||||||
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
|
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
|
||||||
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
|
With RL, DeepSeek-R1-Zero naturally exhibited numerous powerful and interesting reasoning behaviors.
|
||||||
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
|
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
|
||||||
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
||||||
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
||||||
@ -187,7 +187,7 @@ python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
|
|||||||
|
|
||||||
**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
|
**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
|
||||||
|
|
||||||
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
|
1. Set the temperature between 0.5 and 0.7 (with 0.6 recommended) to prevent endless repetition or incoherent outputs.
|
||||||
2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
|
2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
|
||||||
3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
|
3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
|
||||||
4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
|
4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
|
||||||
@ -257,11 +257,11 @@ When responding, please keep the following points in mind:
|
|||||||
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
||||||
DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
|
DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
|
||||||
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1.
|
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1.
|
||||||
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE).
|
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [Llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE).
|
||||||
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE).
|
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [Llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE).
|
||||||
|
|
||||||
## 8. Citation
|
## 8. Citation
|
||||||
```
|
```bibtex
|
||||||
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
|
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
|
||||||
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
|
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
|
||||||
author={DeepSeek-AI},
|
author={DeepSeek-AI},
|
||||||
@ -274,4 +274,4 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
|
|||||||
```
|
```
|
||||||
|
|
||||||
## 9. Contact
|
## 9. Contact
|
||||||
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
|
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user