Advantages of SD3:
Photorealism: Overcomes common artifacts in hands and faces, providing high-quality images without complex workflows.
Prompt adherence: Understands complex prompts involving spatial relationships, compositional elements, actions, and styles.
Typography: With the help of the DiT architecture, it achieves unprecedented results in generating text without artifacts and spelling errors.
High resource efficiency: Due to its low VRAM usage, it is suitable for running on standard consumer GPUs without performance degradation.
Fine-Tuning: Able to absorb subtle details from small datasets, making it ideal for customization.
The model files mainly consist of the following parts:
sd3_medium.safetensors includes MMDiT and VAE weights, but does not include any text encoders.
sd3_medium_incl_clips_t5xxlfp8.safetensors contains all necessary weights, including the fp8 version of the T5XXL text encoder, providing a balance between quality and resource requirements.
sd3_medium_incl_clips.safetensors includes all necessary weights, except for the T5XXL text encoder. It requires the least amount of resources, but the model's performance will differ without the T5XXL text encoder.
The example_workfows folder contains example Comfyui workflows.
Recommended parameters:
fofr mentioned that the images generated by SD3 look high quality, and he also shared the parameters he used: 28 steps, 3.5 CFG, 896x1088, 28 steps, sd3_medium_incl_clips_t5xxlfp8.safetensors. Previously, Emad introduced the recommended sampler for SD3 as DMP++ 2M.