Compare commits

...

9 Commits

Author SHA1 Message Date
Garvit Singh Rathore
05299775a9
Merge 1c10c9f6771fadcd5c8fe52cbac7075527184661 into 0cf78561f1d51c84a21b2190626b21116d5c68bb 2025-04-09 13:36:44 +08:00
Xingkai Yu
0cf78561f1
Merge pull request #129 from peti562/patch-2
fixing a typo
2025-04-09 13:36:23 +08:00
Xingkai Yu
4a6d53cac8
Merge pull request #189 from eladb/patch-1
chore: add syntax highlighting to citation
2025-04-09 13:34:09 +08:00
Xingkai Yu
f1e82facf1
Merge pull request #100 from aBurmeseDev/main
docs: fix contact email link README.md
2025-04-09 13:32:25 +08:00
Garvit Singh Rathore
1c10c9f677
Update README.md
Behaviors are exhibited rather than emerged.
2025-01-30 23:17:27 +05:30
Garvit Singh Rathore
bb10d07b27
Update README.md
Used more accurate words.
2025-01-30 23:12:58 +05:30
Elad Ben-Israel
6a023be7cf
chore: add syntax highlighting to citation 2025-01-30 10:06:19 +02:00
Peter Makadi
6e59fa73e6
fixing a typo
in licences section, Llama should be capitalized
2025-01-28 21:52:11 +01:00
John L.
c942e96852
chore: fix contact mailto link 2025-01-27 18:06:01 -08:00

View File

@ -32,7 +32,7 @@
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. With RL, DeepSeek-R1-Zero naturally exhibited numerous powerful and interesting reasoning behaviors.
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
we introduce DeepSeek-R1, which incorporates cold-start data before RL. we introduce DeepSeek-R1, which incorporates cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
@ -187,7 +187,7 @@ python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 1. Set the temperature between 0.5 and 0.7 (with 0.6 recommended) to prevent endless repetition or incoherent outputs.
2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}." 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
@ -257,11 +257,11 @@ When responding, please keep the following points in mind:
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE). This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1.
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE). - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [Llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE).
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE). - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [Llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE).
## 8. Citation ## 8. Citation
``` ```bibtex
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability, @misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI}, author={DeepSeek-AI},
@ -274,4 +274,4 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
``` ```
## 9. Contact ## 9. Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com). If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).