I. Dream Machine Model
Model Introduction
Jiaming Song is involved in full-stack work related to model training in the Dream Machine project. The model adopts the DiT architecture, which is characterized by a larger range of motion, although it brings controllable issues, but it is considered important for user experience.
The large range of motion is mainly driven by the model and data scale, and previous models with smaller scales were difficult to achieve the desired effect.
Comparison with Other Models
Different from the Pika scheme, it is similar to Sora and Runway Gen-3, and has a stronger association with Sora, all of which are based on the diffusion transformer architecture.
Currently, it is mainly a to C product form, and there are also API demands. The future product form depends on the model's capabilities and market feedback.
II. Video is a Better Route to 3D
Reason for Shifting from 3D to Video Generation
To create better 4D, the choice was made to do video generation. One way to achieve 4D is to generate 3D from images and then turn it into 4D. Another is to directly do video models and then turn them into 4D, which is considered more reliable.
3D data is limited and needs to rely on larger models driven by more data.
The Driving Effect of Video on 3D
Video generation has a strong 3D capability, such as the video model's ability to learn depth knowledge, even abstract pictures can simulate relevant information.
It can understand the reflection and refraction of light, and can also simulate effects on different materials, which has advantages over traditional NeRF.
It can simulate dynamic scenes, fabrics, etc., but there are also imperfect cases, such as not conforming to physical principles and multi-head issues.
Discussion on Video Model's Understanding of the World
The video generation model's understanding of the physical rules of the world may emerge as the model scales up, similar to the development trend of language models.
Many problems that are difficult to solve now may be solved with paradigm shifts, and the development of video models is still in its infancy.
Luma's Advantages and Differences
In terms of technology, there is a focus on generation speed and efficiency optimization, which affects user experience and business models.
There is also an expectation for controllability, which is different from the market like Kuaishou, and focuses more on the development of new products based on future model trends.
Cost Issues
It is uncertain why Sora has not been opened, but it is believed that costs will decrease and new application states will emerge. In addition to increasing GPUs, algorithmic innovation is also needed.
III. How Luma Defines Itself
Company Nature
It should have the innovation ability of a research lab and the agility of a product. A dozen people participated in the Dream Machine model, and engineering capabilities are important. People with a 3D background are strong in engineering capabilities, and the team has not encountered major problems in capabilities. The main challenge is to unify internal goals.
Company Positioning
It is not defined as a specific company in the 3D or video field, but more believes in the scaling laws of vision or multimodality. Both research and products are important, and the company's business is defined according to user feedback and technology trends. It will not suddenly transform into hardware.
Business Model
The situation of paying users and ARR is good, but it is not necessarily to obtain a positive cash flow at this stage. The improvement of model capabilities may change the business model. The business direction of to C or to B is not clear for the time being, and the focus is on making models and products better.
Development of Multimodal Models
It is believed that multimodal generation can be made into an end-to-end model in the future, but efficiency issues need to be considered in different scenarios. It is related to Vision Pro and spatial computing, and when the time is right, it may make related APPs, which are more related to 4D. It is on the lookout for Li Feifei's entrepreneurial projects, and believes that there are fewer engineering and product personnel.
Research Directions
Jiaming Song is interested in directions such as solving the sequence length problem of transformers, understanding existing models, and the scale problem of diffusion. DiT is to use the autoregressive method for diffusion training, and the improvement ideas of related architectures may be universal on different models, but there are differences between diffusion and autoregressive.