I. UltraEdit Overview
Research Background and Objectives
To address the shortcomings of existing image editing datasets (such as InstructPix2Pix and MagicBrush), UltraEdit is proposed. It aims to provide a systematic approach to generating a large number of high-quality image editing samples.
Main Advantages
- Rich editing instructions: Combining the creativity of Large Language Models (LLMs) with contextually edited examples from human annotations, the range of editing instructions is broader.
- Diverse data sources: Based on real images (including photos and artworks), it has higher diversity and less bias compared to datasets generated solely by text-to-image models.
- Supports regional editing: With high-quality automatically generated regional annotations, it enhances the ability to perform regional edits.
II. Construction of UltraEdit
Instruction and Title Generation
Utilizing LLMs and contextual examples, editing instructions and target titles are generated based on collected image titles.
Free-form Data Generation
Using collected images as anchors, regular diffusion is first applied, followed by prompt-to-prompt (P2P) control to generate source and target images.
Regional Data Generation
Edit regions are generated according to instructions, and then an improved inpainting diffusion pipeline is called to generate images.
III. Comparison with Other Datasets
Comparison Situation
EditBench and MagicBrush are manually annotated but limited in scale; InstructPix2Pix and HQ-Edit are large datasets automatically generated using T2I models but have biases. UltraEdit provides large-scale samples with rich editing tasks and less bias.
Types of Editing Instructions
Examples include adding, changing globally, changing locally, changing color, transforming, replacing, rotating, and more.
IV. Experimental Results and Analysis
Quantitative Evaluation
Evaluation of free-form and regional editing data is conducted, including metrics such as CLIPimg, SSIM, DINOv2, etc. The number of instances, unique instruction counts, and their ratios also vary across different types of editing instructions.
Qualitative Evaluation
Qualitative evaluation of models trained on the UltraEdit dataset on the MagicBrush and Emu Test benchmarks, including aspects such as consistency, instruction alignment, and image quality.
V. Editing Examples
Examples of edits generated by Stable Diffusion3 trained with the UltraEdit dataset, supporting free-form and regional edits, such as adding UFOs, moons, cherry blossoms, and changing outfits for people.
VI. Model Performance Evaluation
Evaluation on Different Benchmarks
Evaluation of diffusion models trained on the UltraEdit dataset on different instruction-based image editing benchmarks. The same diffusion models are trained with an equal amount of training data and compared for performance.
Results Under Different Settings
Including single and multi-round settings, different methods show varying performance on L1, L2, CLIP-I, DINO, and other metrics, with models trained on UltraEdit showing better performance.