The stable spread goes public and the internet goes crazy

Welcome to The long view– where we go through the week’s news and boil it down to the essentials. Let’s train what really matters.

This week: a Stable diffusion Special

Unless you’ve been living under a rock for the past week, you’ll have seen something on stable diffusion. It’s the new open source machine learning model for creating images from text and other images too.

Analysis: Open source is the key

Like DALL-E and Midjourney, you give it a textual “hint” and it generates amazing (or sometimes outright garbage) images. Unlike those other models, it’s open source, so we’re already seeing a explosion of innovation.

Mark Hachmann he calls him The new killer app’

Perfect your algorithmic art
The art of AI is fascinating. Enter a prompt and the algorithm will generate an image according to your specifications. Generally all this happens on the Web, with algorithms like DALL-E. [But] Stability.Ai and its Stable Diffusion model broke the mold… with a publicly available model And can run on consumer GPUs.

For now, Stability.Ai recommends having a GPU with at least 6.9GB of video RAM. Unfortunately, only Nvidia GPUs are currently supported. [But] if you own a powerful PC, you can spend as long as you like perfecting your algorithmic craft and coming up with something truly impressive.

From the horse’s mouth, it is Mother Mostaque: Stable circulation Public publication

Use it ethically, morally and legally
It is our pleasure to announce the public release of stable circulation. … We have all been overwhelmed with the response over the last few weeks and have been working hard to ensure a safe and ethical release, incorporating data from our beta model and community testing for developers to take action.

Since these models have been trained on image-text pairs from a large space on the Internet, the model may reproduce some societal biases and produce unsafe content, so open mitigation strategies and an open discussion of those biases can all lead to this conversation. … We hope everyone uses it ethically, morally and legally and contributes to both the community and the discourse around it.

Yeah, for sure. Have you ever been on the internet? Kyle Wigger looks worried: Deepfakes for everyone

90% are women
Stable Diffusion… is now used by art generator services such as Artbreeder, and others. But the unfiltered nature of the model means that not all usage has been completely fair.

Other AI art generation systems, such as OpenAI’s DALL-E 2, have implemented strict filters for pornography. … Also, many lack the ability to create art of public figures. … Women, unfortunately, are by far the victims of this. A study conducted in 2019 revealed that, of the 90-95% of non-consensual deepfakes, approximately 90% are female.

Why is this a big deal? Just ask Simon Willison:

Science fiction is real
Stable diffusion is a really big deal. If you haven’t been paying attention to what’s going on… you really should be. … It’s similar to models like Open AI’s DALL-E, but with one crucial difference: they released everything.

In just a few days, there was an explosion of innovation around it. The things people are building are absolutely amazing. … Generating images from text is one thing, but generating images from other images is a whole new game. … Imagine having an on-demand concept artist who can generate anything you can imagine and who can iterate with you towards your ideal outcome.

Science fiction is real now. Generative machine learning models are here, and the speed at which they are improving is unreal. It is worth paying close attention.

How does it compare to the DALL-E? Just ask Beyond:

Personally, stable diffusion is better. … OpenAI makes it look like they’ve created the holy grail of image generation templates, but their images don’t impress anyone who has used stable diffusion.

@fabianstelzer did a series of comparative tests:

These image synthesizers are like instruments – it’s amazing that we will have so many of them, each with a unique “sound”. … DALL-E is really great for facial expressions. [Midjourney] cleans the floor with others when it comes to… suggests aiming for textural details. … DALL-E is usually my favorite for scenes involving 2 or more clear “actors”. … DALL-E and SD are better in photos … Stable Diffusion can take amazing photos … but you have to be careful not to “overload” the scene.

The moment you put “art” into a prompt, Midjourney freaks out. … DALL-E’s imperfections look very digital, unlike MJ’s. … When it comes to copying specific styles, SD absolutely is it 🤯🤌 [but] DALL-E won’t let you do a Botticelli painting of Trump.

And what about training data? Here you are Andy Baio:

One of the biggest frustrations of text-to-image AI models is that they feel like a black box. We know they were trained on images taken from the web, but which ones? … The team behind Stable Diffusion have been very transparent about how their model is trained. Since its public release last week, Stable Diffusion has exploded in popularity, largely due to its free and lax licensing.

Simon Willison [and I] collected data from more than 12 million images used to train stable diffusion. [It] was trained from three huge datasets collected by LAION. … All LAION image datasets are based on Common Crawl, [which] it scrapes billions of web pages monthly and releases them as huge datasets. …Nearly half of the images, about 47%, came from just 100 domains, with the largest number of images coming from Pinterest. … WordPress hosted blogs on and accounted for … 6.8% of all images. Other photo, art, and blog sites included…Smugmug…Blogspot…Flickr…DeviantArt…Wikimedia…500px and…Tumblr.

Meanwhile how does it work? Letizia Parcalabescu it’s easy for her to say:

How do latent diffusion models work? If you want answers to these questions, we’ve got you covered!

The moral of the story:
What fools these mortals are

You have read The long view from Richie Jennings. You can contact him at @RiCHi or [email protected].

Image: Stable Diffusion, go Andy Baio (Creative ML OpenRAIL-M; leveled and trimmed)