Dall-E 2, Midjourney, and Stable Diffusion produce impressive images on command, but perfecting them requires patient, skilled tending.
Generated by Marshall Smith vis Midjourney
A quartet of dragons materializes before my eyes. They each have golden skin, intricate scales and spikes, and prodigious fangs. Some face right, some left. One pair has three horns on its head, another pair has two. One has a horn on its snout.
And they were conjured by a spell—not in Elvish or Elder, but in plain English. I’ll share it with you: “Chinese dragon made from glossy reflective gold, with oversized details, ultra-realistic 3D render, rim lighting, warm light, cool shadows, soft ambient occlusion, digital painting, 8K HDR.”
With just those words, and about 10 seconds to think, an AI chatbot called Midjourney paints four digital images—each a unique interpretation of that description. Repeat the spell, and you’ll get four more variations. And again, and again.
A Fundamental Shift
What ChatGPT does for writing, Midjourney does for images. And it’s been doing it longer. In summer 2022, it burst into the graphics world along with several other so-called generative AI apps, including Dall-E 2 (by ChatGPT’s maker OpenAI) and the open-source (free to use) program Stable Diffusion.
“Dall-E 2 is certainly the first time people who are not following [this technology], were like, ‘Oh, wow, this is something,’” says Marshall Smith, a veteran video game designer who worked on pop culture sensations like Zynga’s FarmVille and Words with Friends.
That’s when these apps crossed the uncanny valley from creepily inept to appealing, even inspiring creators. Detailed, vivid images that had required experienced designers with sophisticated software to realize could now emerge from mere words.
But is it art?
That’s not just a philosophical question. It’s a business and even legal consideration.
Impressive as Midjourney’s dragons might be to a casual viewer, none of them are ready to go straight into a video game. Getting there will take several rounds of dialogue with the AI—in fact, with several AIs—as well as pulling in traditional tools like Adobe Photoshop.
“It’s an iteration with some of these things,” says Smith. “So I think, ‘Oh, these are cool, but this is not at all what I want.’”
Generative AI makes it easier to talk to computers, but (so far) they still can’t read minds. Getting from a rough idea to a professional artwork with artificial intelligence requires a lot of human intelligence. Let’s walk through how that process could go with the golden dragon.
How We Created This Dragon
Step 1: Ideate
Smith’s current employer, Big Run Studios, has just developed a new mobile slot machine game called Blackout Slots. Though it’s been finished, Smith takes me through how he might create components for it from scratch using a host of generative AI tools and traditional apps.
We start with one that probably everyone knows: OpenAI’s ChatGPT. “List top 20 slot machine themes,” he types. Almost instantly, ChatGPT names and describes a score of options, including Egyptian, Fruit, Jungle Adventure, and Chinese Culture. For the final one, it says, “These games often have symbols like dragons, lanterns, and coins.”
Going with that, Smith asks the chatbot to brainstorm a hierarchy of symbols with different values in the game. They included “Lantern,” “Diamond-encrusted Lotus,” and, for the top “Jackpot” tier, “Golden Dragon.” He then instructs ChatGPT, “For each symbol, I need an image prompt. This is a literal visual description of the symbol image.” He also provides examples of terms that he knows will resonate with Midjourney, such as “ultra realistic 3D render,” “cool shadows,” “soft ambient occlusion,” and “digital painting.” ChatGPT cheerily creates a spreadsheet with prompts for eight symbols, including the golden dragon.
Step 2: Iterate
Smith copies the image prompt from ChatGPT, pastes it into a messaging app called Discord—where Midjourney’s chatbot lives—and does not get a finished product. Instead, we see four low-resolution mockups to choose from.
Picking one, we can then remix it by clicking on a number of buttons below the image—for instance, specifying how much artistic freedom (stylization) Midjourney can take when interpreting our prompt. We can also modify the initial prompt text and re-run the whole process. The sky’s the limit here: 100-plus word prompts aren’t uncommon. But Smith is pretty happy with the latest iteration of the dragon, and simply removes the background scenery by adding the text “no background on white” to the prompt.
Sometimes the dialogues are trickier, because words are open to interpretation—as Smith found while creating characters for a western-themed game. “I was talking about having a snow-capped mountain,” he says. “So, then the AI kind of grabbed onto the idea that it was going to be snowy. So, the character was now wearing a fur lining on his coat.” That’s not quite what Smith was thinking, so he had to finesse. “I had to split some ideas up a little bit [in the prompting]. So like, okay, well, the mountains are snowpack, but in the foreground, it’s a warm, sunny day in Montana,” he says.
This “prompt engineering” process has become an artform in itself, and it could have legal implications. In March, the U.S. Copyright Office issued a rule seeming to say that generative AI can’t be copyrighted because, “users do not exercise ultimate creative control over how such systems interpret prompts and generate material.” But, quoting federal law on “compilation” artworks, it went on to say, “a human may select or arrange AI-generated material in a sufficiently creative way that ‘the resulting work as a whole constitutes an original work of authorship.’”
Would complex rounds of prompting, qualify as a copyrightable compilation? “Whether prompts can receive copyright protection depends on the facts of the case, so it will likely be a case-by-case analysis instead of a general rule,” says Mehtab Khan, a resident fellow at Yale Law School who covers technology and intellectual property.
(According to Midjourney’s terms of service, users with a paid membership own the rights to their creations, but Midjourney also has the right to use and remix those creations.)
Step 3: De-pixelate
Once you’ve gotten the image as far as you think Midjourney can take it, the app can output a higher-resolution version. But it’s not that high-res, currently capped at 1024 by 1024 pixels. (As with all things in AI, that figure may change by the time you read this.)
That’s too low for Smith’s purposes, so he turns to another app, Photo AI from Topaz Labs.
It ingests low-res images and reasons out what the missing details might be. Smith demonstrates this by dragging a slider across his original image to show how Photo AI refines it. Pixelated swathes of fur on the dragon’s head are transformed into rich, layered tufts of fine filaments. The app is not just smoothing out jagged lines, it’s creating entirely new features.
In the process, the dragon goes from a roughly one-megapixel pic to a more than 37-megapixel behemoth. This is a relatively quick step, but an essential one. Will this capability get incorporated into Midjourney or other apps? Very possible—maybe even by the time you read this. (It’s already offered by rival Stable Diffusion.)
Step 4: Manually Create
An AI-generated and refined work would satisfy many creators and purposes. But for people with the technical skills, it’s still easier to do the final touches on their own than to cajole a machine to do it. And for some finer details, manual is still the only way.
So Smith moves his AI-created dragon into Adobe Photoshop, an app he’s been using for over two decades. Although AI is helping here, too.
To get the dragon ready to place on the digital slot machine, Smith first must cut it out from the background. This has always been a core Photoshop capability, but getting it perfect required some manual tweaking.
There’s much less of that since Photoshop began incorporating generative AI in May. It’s now much better at recognizing the outline of an object—even the dragon’s intricate jumble of fur, scales, horns, and fangs. Cutting out the image is a one-click process for Smith (at least, sometimes, he says).
Photoshop is adding more-ambitious generative tools in the spirit of Midjourney, but these are still in the “beta” or experimental phase. To demonstrate, Smith tries adding a flame that emerges from the dragon’s mouth, typing in the prompt, “vibrant purple flame.” A cartoonish blaze appears, but the process also smooshes the dragon’s head and turns its eye purple.
But many of Photoshop’s traditional tools are still superior. Smith uses them to adjust color, for instance. “I don’t like my game art to have shadows that are black,” he says. “So it’s kind of cool to have something that has a purple shadow, but a yellow highlight.” Smith can also adjust contrast, lighting, and exposure. He can thicken parts of the image and do much, much more. “You definitely are doing the last steps in Photoshop,” he says.
Incorporating Photoshop further bolsters that case for copyright. “Information alone cannot be copyrighted but order, arrangement, presentation etc. may be creative enough to receive protection,” says Khan. “So, using Photoshop may help qualify a work for protection.”
Is There a Future For Artists?
Artificial intelligence still can’t completely replace a skilled artist for producing high-end work. But the tools keep improving. “They have been innovating like crazy,” says Smith about Midjourney, though that could apply to any of these apps. “They have new features all the time that have continued to drastically improve the product [with] higher resolution, higher fidelity.” And the level of sophistication generative AI has brought to two-dimensional imagery could someday—perhaps someday soon—come to 3D animation and even filmmaking.
It’s already replacing some mundane but lucrative jobs, such as creating seamless background textures, as in fabrics, wallpaper, or wrapping paper. They are essentially endless grids of repeating images, or tiles, and a lot of high-paid work goes into blending the boundaries between tiles to create a seamless look. Now apps like Midjourney can do it instantly.
There may also be less work for artists creating concept art. Instead, AI can generate oodles of mockups for designers to consider before commissioning an artist to create a high-end image. That, for instance, allows designers like Smith to add more features to games—even entire new characters—that they simply wouldn’t have time for in the past.
Whether in games, fabrics, or any other creations, the ultimate result of generative AI will be more artworks, but perhaps produced by fewer artists. And staying employed means staying on top of these fast-moving technologies so they boost a professional’s skills, rather than supersede them.