SD3 2B Model Open Source Release

The SD3 2B model has finally been released as promised, but after testing, it has been discovered that the model has significant issues with human body generation and images of people lying down. Additionally, the model's response to short prompts is not as good, sparking discussions within the community. My own tests yielded similar results to those of the community; if the prompts are well-written and avoid hands, the image quality and prompt understanding are both online. Emad, former CEO of Stability AI, confirmed that these issues are primarily due to safety alignment. Similar problems exist with DALL-E and Google's image models, but since the SD3 model is open source, these issues can be fixed. The community and SD3 trainers are actively looking for solutions. In terms of ecosystem adaptation, the community is also optimistic. Lora's training code has been released, and the Instant team has also released several ControlNet models adapted for SD 3. It should be noted that this SD3 open source release is for non-commercial use, and the terms regarding model fine-tuning are vague, so care should be taken when deploying.

Advantages of SD3:

Photorealism: Overcomes common artifacts in hands and faces, providing high-quality images without complex workflows.
Prompt adherence: Understands complex prompts involving spatial relationships, compositional elements, actions, and styles.
Typography: With the help of the DiT architecture, it achieves unprecedented results in generating text without artifacts and spelling errors.
High resource efficiency: Due to its low VRAM usage, it is suitable for running on standard consumer GPUs without performance degradation.
Fine-Tuning: Able to absorb subtle details from small datasets, making it ideal for customization.

The model files mainly consist of the following parts:

sd3_medium.safetensors includes MMDiT and VAE weights, but does not include any text encoders.

sd3_medium_incl_clips_t5xxlfp8.safetensors contains all necessary weights, including the fp8 version of the T5XXL text encoder, providing a balance between quality and resource requirements.

sd3_medium_incl_clips.safetensors includes all necessary weights, except for the T5XXL text encoder. It requires the least amount of resources, but the model's performance will differ without the T5XXL text encoder.

The example_workfows folder contains example Comfyui workflows.

Recommended parameters:

fofr mentioned that the images generated by SD3 look high quality, and he also shared the parameters he used: 28 steps, 3.5 CFG, 896x1088, 28 steps, sd3_medium_incl_clips_t5xxlfp8.safetensors. Previously, Emad introduced the recommended sampler for SD3 as DMP++ 2M.