Master Zang's Hands-on Series: Teach You to Make the Speaking and Facial Expressions of AI-Generated Video Characters More Vivid

Last night, Runway also launched its own image-to-video function. Recently, during the free period of Keling, everyone has probably tried out these new DiT video generation models. Currently, the main problem with these video generation models is insufficient control methods. One of them is the control of facial expressions and the movement of the mouth when speaking. When Kuaishou released Keling, they also conveniently open-sourced a LivePortrait that can transfer facial expression videos of human faces to pictures to generate facial expression pictures of corresponding pictures. A few days ago, they released version V2, which can now directly transfer expression videos to videos with human faces. The core of our process is based on this project.

Tools needed:

Image generation tool: Either Midjourney or Stable Diffusion is fine.
AI video generation tool: Recommended are Runway, Keling, Luma, etc.
Then there is Hedra, a tool that can generate talking video of human faces by providing only pictures and text.
Elevenlabs is a tool that can generate speaking audio files through text.
Finally, there is LivePortrait, which needs to be completed through ComfyUI. I will explain in detail the parameters that need to be adjusted in the workflow.
01 Image generation
There is nothing much to explain here for image generation. Points to note are:
The human face needs to occupy a large enough proportion in the picture and be clear enough. Otherwise, the model cannot recognize the facial features. It is recommended to use a half-body portrait level.
It is best to use a frontal face picture. The person in the picture should not be sideways or in profile.
The generated picture can have a smile, but it is best not to have a big laugh, that is, showing a lot of teeth. Otherwise, the video generation model may keep laughing, and the expression transfer will have problems.
02 Generating audio
Here it is recommended to use Elevenlabs (https://elevenlabs.io/) to generate speaking audio from text. Currently, it is free, and there are many unique voices to choose from.
After entering the website, select Speech, enter the text you want to generate, select a voice at the bottom left and click generate. Then find the generated audio in History and download it.
03 Generating talking facial video
Now that we have sound and character pictures, we need to use tools to generate talking videos. Here, Hedra (https://www.hedra.com/) is still recommended.
It can generate a talking head video of a character by adding text or audio and a picture. However, the problem is that only the face moves.
Of course, now with LivePortrait, this is completely not a problem. We can merge the generated head video with an existing video.
First, we'd better adjust the length of the audio. About 5 seconds is best. Generally, an AI-generated video is about 5 seconds.
Then click Import audio to import the generated and adjusted audio.
After that, upload your picture at the Character position, and it will automatically crop out the human face.
Then click "Generate video" to generate. Note that Hedra has very strict underage review, so you can try the picture first.
The generated talking video can be downloaded and used as a backup.
04 AI video generation
Next, we need to prepare another piece of material, that is, put the picture we generated into Runway and Keling to generate a video.
There is nothing much to explain about the operation process here. Just upload the picture, enter the prompt words and generate the video.
Points to note: The generated video should not have large head turns or exaggerated expressions such as big laughs. Otherwise, the expression fusion will be very awkward.
05 LivePortrait expression transfer
Okay, we are about to start the last step, which is to merge all our materials, that is, the two videos generated by Hedra and Keling.
Before merging, it is best to process these two videos to make their duration and frame rate consistent. For example, both of my videos are 30 frames per second and the total duration is 5 seconds. This can be done with Jianying.
For expression transfer in Comfyui, we will use kijai's LivePortrait plugin and the workflow in his example. The original workflow is not quite the same as our needs. I have made some changes, such as changing the total frame rate limit to more than 5 seconds and changing the sound source to the expression video.
You can follow the official account and reply with ["expression"] to get the workflow and the materials I used.
Let's talk about the parts that need to be adjusted in the workflow. I won't talk about plugin installation and Comfyui installation. You can also directly open this workflow on the cloud service of Lanrui Xingzhou (https://www.lanrui-ai.com/register?invitation_code=9778). They should complete the adaptation and plugin installation tomorrow (240801). You can use it directly without installing it yourself.
First, for the video upload position, select the video to be transferred (generated by Keling) in the upper red box and select the expression video (generated by Hedra) in the lower red box.
Then click "Add prompt word queue" on the right. Your version may be in English. Anyway, the position is the same. Wait for the progress bar to finish and you can see that the red box position has been generated.
Then right-click on the video and select "Save preview" to save the generated video.
In fact, it is very simple to complete by clicking three times except for the installation of plugins and the deployment of Comfyui.
In addition, you can also use real-shot expression and talking videos to complete the transfer. In this way, the authenticity is higher and the effect is better. What you need to do is to crop your real-shot video face into a square video. This can also be done with Jianying. The final effect is like this:
06 Obtaining materials and workflows
Finally, let's emphasize again the way to obtain materials and workflows:
You can follow the official account and reply with ["expression"] to get the workflow and the materials I used.
Let's talk about the parts that need to be adjusted in the workflow. I won't talk about plugin installation and Comfyui installation. You can also directly open this workflow on the cloud service of Lanrui Xingzhou (https://www.lanrui-ai.com/register?invitation_code=9778). They should complete the adaptation and plugin installation tomorrow (240801).
Exploring the process and writing the tutorial takes a long time. If you think it's okay, you can click like or share it with your friends who need it.🙏