robot

The Dumpling and the 1000 Days of Stable Diffusion

This article provides a detailed retrospective of Dango233 (Dumpling) and huoju's contributions to the open-source AI community. Three years ago, out of interest, Dumpling joined the EleutherAI Discord community and began to engage with CLIP+VQGAN, gradually participating in deeper discussions about image generation technology. Over time, he has been involved in several projects, such as Disco Diffusion and Majesty Diffusion, and was invited to join StabilityAI, a company founded by Emad Mostaque, dedicated to open-source AI. Dumpling also faced a crossroads in his career, ultimately choosing to participate in the development of Stable Diffusion, a decision that greatly changed his professional trajectory.

If you watched the on-schedule open-source release of Stable Diffusion 3 Medium last night, you would find three names on the Model Card, two of which are familiar in the Chinese community.

Dango233 and huoju are both developers who have personally participated in the training of SD3. They have been involved in image generation since the Disco Diffusion era (2021), witnessing the community's magnificent journey over more than 1000 days.

The twists and turns that led to the successful opening of SD3 and their relentless efforts will not be discussed here, but in short, these two are true enthusiasts and practitioners of open-source. Our Diffuseum is also very grateful for their role as founding members, providing technical support and confidence to us and the Chinese community.

The following is the full text written by Dango233 and huoju on the eve of the SD3 open-source release, recalling their journey of joining this exciting global technological revolution out of interest. A small compilation is made in memory, and we are happy for them ❤️

(Note: Dango = Dumpling)

Finished packaging and uploading, while there is still time before the Release, let me talk a bit:

Three years ago, at this time, after being teased by DallE for half a year, I encountered the open-source CLIP+VQGAN and entered EleutherAI's Discord out of interest (and a project gap).

That was my first contact with the open-source AI community. I mainly participated in the #Art channel, which was very different from the technical communities I had known before - the community was filled with a lively atmosphere every day, with all kinds of people showing off their strange works.

At that time, I was still a novice, learning Python (yes, I could only write R back then), asking for advice from the community's experts, and waiting for my abstract works to slowly come out.

After a few months, with the release of Katherine's (RiversHaveWings) Guided Diffusion, I gradually began to participate in deeper discussions and contributed some code back to the community, receiving a lot of positive feedback.

It was still a wild era for image generation, and everything was difficult to do, but looking back, it was really very happy to tinker with all kinds of things with the community's experts and get the title of Artist De Neuro.

"trending on artstation" is probably the most words I typed during that period.

Two years ago, in 2022, it was also this season, I went to Yunnan to eat mushrooms.

I didn't see the dancing little people after all, but I did see the crossroads of my career.

On one hand, I got an offer from the client I was serving at the time, which was quite fitting.

On the other hand, the spores I scattered in the community out of interest seemed to have grown some mushrooms.

The Disco Diffusion I contributed to was still in the limelight;

The Majesty Diffusion I collaborated with Apolinario (MultimodalArt, now at Huggingface), based on Latent Diffusion, gained some attention. It was also one of the earliest projects in the community to explore the multi-stage fine control of the image generation process;

Several startups came to talk to me, asking if I wanted to join them on an adventure.

I am actually a very risk-averse person.

At that time, I was still a novice in ML, not even walking steadily, running a startup?

I might as well continue to be a strategic designer/product manager.

Among the adventurers who came to me at that time, there was a seemingly floating old brother, his name is Emad.

I have known him in the community for a period of time. My impression at the time was that he is a fun person, with interests similar to mine, both like to make various portraits.

Oh, right, and a bit of a homebody.

One day he suddenly messaged me privately, asking if I wanted to join a newly established company called Stability.

I was still consulting at the time, so I did a DD. So I opened Google and searched.

The official website has only one line: "AI by the People, for the People"

Search for Emad, and you find that he was commenting on the international oil price trend a few years ago...

... I'm afraid I've encountered a telecommunication fraud, right?

Of course not, but that's what I thought at the time.

Until June, just on my birthday, Emad sent me (and a few core community developers) the weights in the training of Stable Diffusion.

It blew my mind. It's something that will change the world.

Maybe a little weirder.

I can't say how much it has changed the world, but these mushrooms have indeed changed my career trajectory.

So, now, I'm working on SD3.

I should be the person with the longest history of participating in image generation in my company.

↓ This is the little person I didn't see after eating mushrooms that year, generated by the early version of Stable Diffusion training, and then怼Majesty Diffusion's Pipeline.

A year ago, still this season.

SDXL, ComfyUI, Generative-models

UCAN, WAIC, AGI Playground

Coding on one foot, running events on the other.

Very busy.

Also very confused.

For various reasons that can be said and cannot be said, my own and external, my public participation in the community is getting less and less, and finally stopped in the second half of last year, feeling like I missed a lot.

On the other hand, I feel stretched too thin, writing and doing a lot of things, some of which haven't been released, some of which have been released by community comrades first, it's better to focus more.

After that, playing with models, optimizing, and working on workflows... it's still very happy, after all, it's what I like to do.

Strategic, proposal writing, drawing, being a firefighter... it's also easy to handle, after all, it's been rubbed for three years in consulting.

It's just... it doesn't seem to have satisfied results.

I don't know if I have made the world a better place.

The entire AI landscape is also getting more and more complicated, open source? Closed source? Model? Product? America? China?

Indeed, in every sense, it is a very turbulent year.

Now, June 12, 2024, 20:30, there is a little time left before the release of SD3 Medium.

After another year of twists and turns, the relay baton of training and open-sourcing SD3 has been handed over to me.

This thing is really quite heavy...

Anyway, these can be said later, I'll go and finish up, wait a moment....

I am honored to have completed the training of the sd3-medium open-source model with Yizhou and Lykon. Starting from being obsessed with Disco Diffusion, to being able to participate in the training of sd3, this is a milestone for me personally. Although only the 2b version of the weights is open now, I personally prefer the 2b version for a simple reason, DiT is still a relatively new architecture, the more people who can use it at low cost, the more likely it is to produce interesting things. And the capabilities and resource consumption of the 2b model are a balance point for consumer hardware. I hope you like it as much as I do.