Sensetime Launches GPT-4o-like Real-time Voice Demonstration

Sensetime has released the "SenseNova 5.5" series of models, and there is also a GPT-4o-like version called "SenseNova 5o" which has a rather peculiar name. At the event, they also demonstrated real-time voice conversations, which, although not as smooth as GPT-4o, had a turn-based feel but were still quite impressive.

The model training is based on over 10TB of high-quality training data, including a large amount of high-quality synthetic data, to build a higher-order thought chain. The model uses a hybrid end-cloud collaborative architecture, with 600 billion parameters, to maximize the synergy between cloud, edge, and end devices, achieving a reasoning speed of 109.5 characters per second.

They also released a mini-program called Vimi that can make a photo come to life, including body movements, facial expressions, and voice control, which is currently in internal testing.