Flash Diffusion: Accelerating Any Conditional Diffusion Model

This method enables pre-trained diffusion models to generate images faster and more efficiently. Like a teacher (a complex model) teaching a student (a simplified model) how to quickly draw beautiful pictures, Flash Diffusion allows the student model to learn how to generate high-quality images in just a few steps. This approach is not only fast but also versatile across different tasks, such as turning text into images, repairing damaged images, changing faces in images, enhancing image clarity, etc. Moreover, this method also provides official implementation code for easy use and research.

In this paper, we propose an efficient, fast, and general distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

This method achieves state-of-the-art performance in FID and CLIP-Score on the COCO2014 and COCO2017 datasets with only a few steps of image generation, while requiring only a few hours of GPU training and fewer trainable parameters than existing methods.

In addition to efficiency, the versatility of this method is demonstrated across multiple tasks, such as text-to-image generation, repair, face swapping, super-resolution, and using different backbones, such as UNet-based denoisers (SD1.5, SDXL) or adapters like DiT (Pixart-α).

In all cases, the method significantly reduces the number of sampling steps while maintaining very high-quality image generation.

The official implementation can be found at https://github.com/gojasper/flash-diffusion.

Please note that due to network issues, the link may not be accessible at the moment. If you encounter any issues, please check the link's validity and try again later.