Among various fields, mathematics and coding are two of the most critical areas for assessing the reasoning capabilities of large language models (LLMs).
Just last week, Google's DeepMind made an eye-catching announcement. They claimed that with the help of two advanced models specifically developed for the mathematical domain, they successfully solved four out of six problems in this year's International Mathematical Olympiad (IMO), achieving a silver medal.
Each of these models has its unique features. AlphaProof is a powerful AI system that integrates natural and formal languages. It uses reinforcement learning for self-training, during which it continuously learns and accumulates knowledge, gradually gaining the ability to prove complex mathematical assertions. This training method enables AlphaProof to start from known conditions, apply various mathematical theorems and methods, and gradually deduce conclusions, thus completing the proof of mathematical assertions.
The other model, AlphaGeometry 2, is an innovative neural-symbolic hybrid system. Its language model is built on Gemini, which endows it with strong language understanding and processing capabilities. In solving geometric problems, AlphaGeometry 2 has shown significant improvements. Whether it's dynamic geometric problems involving the motion of objects or static geometric problems about angles, ratios, or distance equations, it can handle them with ease. For example, when dealing with the motion trajectory of objects in space, it can accurately analyze the relationship between parameters such as the object's position, velocity, and acceleration, and use geometric knowledge to calculate the object's position and posture at different moments. When faced with problems about angles, ratios, or distance equations, it can quickly identify the geometric elements in the problem, establish equations based on known conditions, and then solve for the unknowns.
Additionally, an interesting discussion emerged on the social platform Twitter. A blogger posted a picture of Google's achievement and asked how many points an OpenAI model would score if it participated in this competition. OpenAI's key figure, Sam, responded with an amusing interjection. Although the response did not explicitly state the score, it seemed to imply that OpenAI might win a gold medal.
If OpenAI were to really win a gold medal in IMO, it would trigger a series of in-depth reflections on the development of artificial intelligence. According to Paul Christiano, the inventor of reinforcement learning from human feedback (RLHF), if large language models (LLMs) could win a gold medal in IMO before 2025, the arrival of artificial general intelligence (AGI) might soon become a reality. On the contrary, if this goal is not achieved, the arrival of AGI might have to wait for several more decades.
Currently, the entire industry is looking forward to the new model that OpenAI will launch at the end of the year. From many signs, it can be seen that OpenAI is focusing on solving the reasoning capabilities of LLMs. They may be trying various new algorithms, architectures, and training strategies to enhance the model's reasoning capabilities in key areas such as mathematics and coding, so that it can achieve better results in future competitions and lay a solid foundation for the development of artificial intelligence to reach a new level.