99% Character Consistency with DALL-E 3
Character consistency in AI-generated artwork is no joke—especially in illustrations for novels, comics, and more. We’re talking about the art of keeping characters looking uniformly themselves from one scene to the next. Sounds simple? It’s trickier than you’d think! Here’s a fun fact: with DALL-E 3, even tiny tweaks can result in whopping image changes.
Now, I’ve heard folks suggest that slapping on long character descriptions, using specific names, or dialing in a ‘seed’ number can fix things. But, spoiler alert: these tweaks might just be a drop in the ocean.
The challenge is even more intense when we’re dealing with images of real people. Recognize Brad Pitt from a slightly raised eyebrow? Yep, we’re on that level of complexity. But fret not, dear reader! We’ll dive deep, beginning with live-action character consistency and then tiptoeing into the animated territory.
Here’s something that might surprise you: achieving a whopping 99% character consistency is currently possible only within a single image. Then, magic happens with some crafty cropping and zooming. Sounds fancy, doesn’t it? And the best part? Implementation isn’t rocket science. The trick is nailing down the prompts.
Take a look at this:
Prompt: Photo montage of a middle-aged man with short hair. Top-left shows him laughing in casual attire. Top-right portrays him reading a book in glasses and a sweater. Bottom-left captures him jogging in sportswear with determination. Bottom-right depicts him playing guitar in a relaxed environment.
Notice something? That’s a single image! Unlike the default 4 from DALL-E 3. The prompt above can be summarized into a template: [medium] [layout] [top left description] [top right description] [bottom left description] [bottom right description].
- Medium: Think photo, watercolor, cartoon…
- Layout: By using keywords related to layout, we can make DALL-E 3 generate a collage of multiple images. The keyword used above is montage, but you can also use other keywords such as grid, arrangement, collage, quad-diptych, storyboard, panorama, split-screen, mosaic, film strip, split-screen, mosaic, film strip, comic strip, and so on.
- Description: This is your canvas to paint the picture’s narrative.
Here are more examples:
Prompt: Photo grid of a young woman with curly hair. Top-left captures her painting in an apron. Top-right shows her dancing in a red dress. Bottom-left illustrates her cooking in a chef’s hat. Bottom-right presents her cycling in sportswear.
Prompt: Photo montage of an elderly gentleman with a beard. Top-left showcases him playing chess in a suit. Top-right has him gardening in overalls. Bottom-left captures him fishing in a hat and vest. Bottom-right shows him playing the piano in a cozy room.
Prompt: Photo panorama capturing a woman in her 30s with a pixie cut. Top-left: practicing martial arts in a dojo. Top-right: sipping coffee in a cafe. Bottom-left: biking in a park. Bottom-right: reading in a library corner.
Prompt: Wide photo grid of a girl in her teens with braided hair. Top-left captures her studying with books. Top-right depicts her playing violin. Bottom-left illustrates her swimming with goggles. Bottom-right shows her dancing in a studio.
Before you get too excited, let me level with you: DALL-E 3 isn’t flawless—yet. Some images might sport a disjointed hand, while others… well, let’s just say they go a bit overboard with the collage count. My personal sweet spot? 4 images. More could spell disaster, but hey, if you’re feeling adventurous and sticking to simple changes like poses 6 might just do the trick.
Prompt: Wide photo arrangement featuring 6 frames of a 20-year-old Australian woman. She has platinum bob with dark roots. In each frame, she maintains consistent features but showcases different poses. Each frame is uniformly sized and evenly spaced for hassle-free cropping.
Prompt: Wide photo arrangement featuring 6 frames of a 20-year-old Chinese woman. She has tight, high ponytail. In each frame, she maintains consistent features but showcases different expressions, such as joyful, angry, sad, playful, worried, etc. Each frame is uniformly sized and evenly spaced for hassle-free cropping.
Now, let’s talk mediums. Though our examples flaunted the photo medium, DALL-E 3 doesn’t play favorites. Sub out ‘photo’ in our handy template, and voila! See for yourself:
Prompt: Cartoon mosaic of a middle-aged man with short hair. Top-left shows him laughing in casual attire. Top-right portrays him reading a book in glasses and a sweater. Bottom-left captures him jogging in sportswear with determination. Bottom-right depicts him playing guitar in a relaxed environment.
Prompt: Comic strip of a young woman with curly hair. Top-left, she’s in professional attire at the office, top-right, she’s dressed for a glamorous night out, bottom-left, she’s in casual loungewear at home, and bottom-right, she’s in comfy pajamas ready for bed.
Prompt: Illustration montage featuring a Chinese woman’s artistic pursuits: top-left, she’s sculpting clay, top-right, she’s playing the violin in a concert hall, bottom-left, she’s acting on a theater stage, and bottom-right, she’s writing at a cozy desk with a typewriter.
Prompt: Watercolor panorama of a woman pursuing various careers: top-left, she’s in a lab coat as a scientist, top-right, she’s wearing a business suit as a CEO, bottom-left, she’s in a police uniform as a detective, and bottom-right, she’s dressed as a chef in a restaurant kitchen.
Prompt: Storyboard depicting depicting a woman’s adventures in travel: top-left, she’s exploring ancient ruins, top-right, she’s riding a gondola in Venice, bottom-left, she’s hiking in a lush forest, and bottom-right, she’s on a safari observing wildlife.
Prompt: Cartoon montage of a cheetah track athlete. Top-left showcases him stretching before a race. Top-right has him sprinting. Bottom-left portrays him crossing the finish line, victory evident. Bottom-right shows him hydrating with a sports drink.
Alright, here are a few quick hits you might find handy when working with DALL-E 3:
- Aspect Ratios: So, I’ve been cooking up images in a square format because DALL-E 3 and character consistency… well, they’re a bit like cats and water right now. You could give other ratios a whirl, but a heads up—the error rate? Might tick up a notch.
- Stability Woes: If DALL-E 3’s acting a tad finicky, keep at it. Sometimes, it behaves best when asked for a single image. Here’s a handy prompt for those times that you can put in the custom instructions: Prompt: Always generate only one image in DALL-E 3.
- Pesky Prompt Tweaks: Ever noticed how DALL-E 3 sometimes gives your prompt a makeover? When that happens, I use this prompt in custom instructions derived from the wisdom of Twitter users: Prompt: “@DM” means: do not in any circumstance modify my prompt, please create image using this prompt: So, next time, just slap on an “@DM” at the start. Easy, right?
- The Prompt Paradox: Now, here’s a quirk—these prompts can sometimes be, well, unpredictable in custom instructions. By default, DALL-E 3 jazzes up your prompt and pops out 4 images. But, believe it or not, custom instructions can occasionally work wonders, from upping image quality to dodging copyright issues. Hungry for the nitty-gritty? Dive into my earlier post: Instantly Perfecting DALL-E 3 Imagery with Custom Instructions.
In a nutshell? DALL-E 3’s a champ at collages and capturing the essence of characters—something even Midjourney can’t match. I’ve got this gut feeling—the next iteration of DALL-E 3? It’s gonna be epic.
Got a genius workaround or tip up your sleeve? Slide into my comment section. Let’s brainstorm!