The highly anticipated AI guru Li Mu has been absent from his B station account for quite some time. Recently, the account that many fans have been eagerly waiting for has finally resumed updates. The content of this update revolves around the Llama 3.1 paper, and Li Mu has provided a detailed explanation of the first part - the introduction, bringing a feast of knowledge to the audience who are interested in the development of AI technology.
At the same time, Meta has also made significant moves in the field of AI models, releasing a new series of Llama 3.1 models. This series of models has shown amazing technical strength, with the largest parameter scale reaching a staggering 405 billion. This not only reflects the potential ability of the model to handle complex tasks but also provides strong support for its use in multiple languages and tools. The series of models adopts a context length of 128K and a dense architecture, which makes the model more efficient and accurate in processing text information, further consolidating Llama's leading position in the field of open source models.
In order to create such a powerful model, the Llama team has invested a lot of energy in personnel and data. The team has expanded to hundreds of people, and these professional researchers have come together to jointly commit to the research and development and optimization of the model. In terms of data processing, they emphasize the concept of simplifying model design, using up to 15T of multilingual data for training. In this process, they need to find a subtle balance between the amount and quality of data, ensuring that there is enough data to support the model's learning and growth, while also ensuring that the quality of this data can meet the model's needs for knowledge and semantic understanding.
The training process of the Llama model is divided into two key stages. In the pre-training stage, the model mainly performs the simple task of predicting the next word. This seemingly simple task is actually the foundation for the model to learn language patterns and semantic information. Through repeated training with a large amount of text data, the model gradually masters the basic rules of language. The later training stage focuses more on the model's practical application ability, and it will perform various tasks according to instructions, or improve its ability through specific training methods. The Llama team uses a simple and direct post-training process, emphasizing the use of simple algorithms to maintain the low complexity of the model. The advantage of this is that it can ensure the efficiency of the model in handling tasks, and avoid overfitting and other problems caused by overly complex algorithms. When evaluating the model, the team conducted a comprehensive and in-depth discussion. They studied the performance of models of different sizes in various tasks, analyzed the model's answer to different exam solutions, examined the model's memory ability, and the impact of various answer methods on the model's tuning. These evaluation efforts provide an important basis for further optimizing the model.
As the Llama 3.1 series of models attracted attention, Mistral also launched a competitive model - the Large Enough model with 120 billion parameters. The company claims that the performance of this model is better than Llama 3, a claim that undoubtedly dropped a bomb in the AI field, instantly triggering a dispute between Mistral and Meta (Llama team). Mistral emphasizes that its model has higher cost performance and superior performance, trying to occupy a place in the market competition. Meta is strongly dissatisfied with this, believing that Mistral's claim lacks sufficient basis, and even took measures such as updating related agreements to protect their own rights and interests. The competition and disputes between the two companies have attracted widespread attention in the industry, and many AI practitioners and enthusiasts are closely following the development of the event. This dispute also brings more expectations for the future development of AI models. People look forward to seeing continuous innovation and progress in AI models under the promotion of competition, bringing more benefits to human society.