Deepseek officially launched their meticulously developed DeepSeek - Coder - V2 code model last week. This model boasts a formidable 236B total parameters, with 21B of those being activation parameters, demonstrating impressive performance. In the realm of coding capabilities, it has achieved remarkable results, successfully surpassing GPT-4 turbo and ranking just below GPT-4o, undoubtedly attracting widespread attention in the field of code models.
The model series is open-source and primarily consists of two distinct models:
Firstly, there is DeepSeek - Coder - V2, which serves as the model for official display and API provision, with a total parameter count of 236B. In practical applications, it has certain deployment and fine-tuning characteristics. From a deployment perspective, a single machine with 880G of resources can achieve deployment. Regarding fine-tuning, a single machine with 880G of resources can also meet the needs, although it should be emphasized that this process requires a certain level of expertise and experience, and is not straightforward.
The other model is DeepSeek - Coder - V2 - Lite, with a total parameter count of 16B and an activation parameter count of 2.4B, and it supports FIM (which could be a specific technology or feature). In terms of coding capabilities, it is very close to DeepSeek - Coder - 33B (V1) and can meet the needs of many related application scenarios. It also has its own advantages in deployment and training. From a deployment standpoint, a single card with 40G of resources is sufficient for deployment; if single-machine training is required, the machine should be equipped with 8 * 80G of resources.
In addition to the powerful functions of these two code models themselves, they have also carried out a series of optimizations and innovations on their own model testing platform. They have quickly adapted a feature similar to Artifact for automatic code rendering. This feature provides users with great convenience. When the model finishes outputting code, users only need to click the run button, and the feature will quickly render the code into a web page or chart format. In this way, users can more intuitively view the results and presentation effects of the code, greatly improving the readability and understandability of the code, and further enhancing the user experience of using the model.