The model training is based on over 10TB of high-quality training data, including a large amount of high-quality synthetic data, to build a higher-order thought chain. The model uses a hybrid end-cloud collaborative architecture, with 600 billion parameters, to maximize the synergy between cloud, edge, and end devices, achieving a reasoning speed of 109.5 characters per second.
They also released a mini-program called Vimi that can make a photo come to life, including body movements, facial expressions, and voice control, which is currently in internal testing.