In this paper, we propose an efficient, fast, and general distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.
This method achieves state-of-the-art performance in FID and CLIP-Score on the COCO2014 and COCO2017 datasets with only a few steps of image generation, while requiring only a few hours of GPU training and fewer trainable parameters than existing methods.
In addition to efficiency, the versatility of this method is demonstrated across multiple tasks, such as text-to-image generation, repair, face swapping, super-resolution, and using different backbones, such as UNet-based denoisers (SD1.5, SDXL) or adapters like DiT (Pixart-α).
In all cases, the method significantly reduces the number of sampling steps while maintaining very high-quality image generation.
The official implementation can be found at https://github.com/gojasper/flash-diffusion.
Please note that due to network issues, the link may not be accessible at the moment. If you encounter any issues, please check the link's validity and try again later.