Microsoft has open-sourced a text encoder named Glyph-ByT5-v2. This text encoder is highly functional, supporting the generation of exquisite images in over ten different languages. This feature provides great convenience for users from different linguistic backgrounds, allowing them to easily create images using it.
In addition, Microsoft has paired an SDXL model with this text encoder. This SDXL model performs exceptionally well, capable of directly generating Chinese posters and rich content. From the demonstration, the layout is very impressive, with professional and aesthetic choices in font selection, layout design, and color matching.
During the development process, Microsoft created a high-quality multilingual character text and graphic design dataset. This dataset is massive, including over 1 million character text pairs and 10 million graphic design image text pairs. Moreover, it covers nine additional languages, providing abundant resources for research on character text and graphic design in a multilingual environment.
At the same time, Microsoft has also constructed a multilingual visual paragraph benchmark dataset. This dataset includes 1000 prompts, with 100 for each language. Its purpose is to evaluate the accuracy of multilingual visual spelling, providing an important benchmark for improving the quality of multilingual visual creation.
Technologically, Microsoft has adopted the latest step-aware preference learning method. This method effectively enhances the visual aesthetic quality, making the generated images and posters more visually appealing, with greater artistic and aesthetic value.