OpenAI’s latest release, the DALL-E 3 model, is now accessible for ChatGPT Plus users. This means that if you’re a Plus user, you can now create AI-driven drawings within ChatGPT.
What’s even cooler? Bing offers a free preview of ChatGPT! Head over to bing.com/create and newbies can whip up 100 images without spending a dime. Need more? It’s available, just at a slower pace.
Personally, I was eager to dive into the intricacies between DALL-E 3 and Midjourney. So, I embarked on a deep-dive analysis. I specifically explored 8 features like human poses, composition, and surreal effects. Here’s the fun part: it turns out Midjourney struggles significantly in these areas. And the images I’ll unveil below? Well, they speak volumes about the chasm between the two tools.
Let’s jump right in, shall we?
Breaking it Down:
1. Human Poses
My past encounters with Midjourney highlighted a recurring issue: it struggles to capture the essence of human gestures. Here’s a telling example:
The Ballet Scenario:
I meticulously crafted a prompt detailing a ballet dancer’s graceful pose.
Prompt: A ballet dancer showcasing exceptional athleticism and grace. En pointe on her right foot, her left leg extends seamlessly upward in a perfect straight line, toes elegantly pointed. Her torso and head gracefully lean back, face tilted upward as if reaching for an unseen light. Arms outstretched and upward, reflecting the fluidity and poise of her dance.
- Midjourney’s Take (1st image): The result was underwhelming. The image depicted a dancer, but her attire was far from professional. Instead of capturing the light, airy grace of a ballerina, her stance felt stiff and uninspired.
- DALL-E 3’s Interpretation (2nd image): In stark contrast, DALL-E 3 delivered brilliantly. The image showcased a quintessential ballerina, her pose echoing my prompt to perfection. Moreover, the impeccable lighting accentuated her elegance.
The Yoga Challenge: Switching gears, I tested the waters with yoga poses.
Prompt: a young woman gracefully balancing on a city rooftop at sunrise with the warrior 1 yoga pose
- Midjourney’s Attempt (1st image): Of the several images, the best had the arms rightly extended upwards. But the rest? They either stood idly or twisted oddly. Far from a basic yoga stance.
- DALL-E 3’s Version (2nd image): Here, DALL-E 3 showcased its strength yet again. The yoga poses were on point, even though the attire details weren’t as refined as Midjourney’s image.
2. Multi-Person Compositions
Generating images with several individuals, especially when their positions are vital, can be a real challenge. Allow me to break down my experiences with both tools:
Group of Four Scenario:
Using the prompt below, I hoped to create an image featuring four individuals.
Prompt: sunlit ballet studio, poised 20-year-old ballerina on tiptoes, perfect ballet posture, extended arms in a soft curve, joyful expression, three 5-year-old girls sitting on polished wooden floor, gazing up in wonder and admiration
Using a specific prompt, I hoped to create an image featuring four individuals.
- Midjourney’s Approach (1st and 2nd images): I played around with two different aspect ratios, but neither truly nailed my vision. The compositions missed out on incorporating all four individuals as I’d described.
- DALL-E 3’s Execution (3rd image): The outcome? Remarkably accurate. DALL-E 3 captured the essence and placement of my prompt, producing a compelling representation.
Trio Pose Scenario:
For my next test, the prompt revolved around a scene featuring three individuals.
- Midjourney’s Creation (1st image): Despite several attempts, I couldn’t get the desired scene where a woman sits atop the shoulders of two men. The resulting image was close, but not quite right.
- DALL-E 3’s Depiction (2nd image): Though the image felt slightly off—particularly the eerie glow in their eyes—the overall composition and postures of the characters were spot-on, aligning well with my initial vision.
3. Foreground vs. Background: A Test of Clarity
One of the challenges with Midjourney is the ability to clearly distinguish between the foreground and background elements. Let me walk you through some examples to shed light on this:
The Surreal Teacup Scene:
One issue that might have caught your eye with Midjourney’s creations is the frequent blending of the foreground and background. This often leads to a lack of distinction between the main subject and its backdrop. Allow me to walk you through some instances:
Prompt: dainty porcelain teacup, whimsical cloudscape background
- Midjourney’s Interpretation (1st image): The result was visually appealing, yet flawed. The clouds encroached upon the teacup, causing an unintended merge.
- DALL-E 3’s Creation (2nd image): Here, the distinction between the teacup and the clouds was clear. However, there was a slight hiccup: the sky’s hue closely resembled the teacup’s color. Despite this, DALL-E 3’s adherence to my prompt was evident, even if Midjourney’s artistry was slightly superior.
The Marbled Dial Scenario:
Prompt: luxury wristwatch, intricate dial, leather strap, background with marble texture
- Midjourney’s Outcome (1st image): A close inspection revealed the marbling extended onto the dial itself, a deviation from my original intent.
- DALL-E 3’s Depiction (2nd image): This rendition was more in line with my vision. The dial was pristine and intricate, standing out clearly against the marbled background.
To be fair to Midjourney, it isn’t always prone to merging foreground and background. With a bit of finesse in the prompt or leveraging features like inpainting, one can achieve the desired distinction. For instance, by elaborating on the dial’s details in my prompt, I managed to get a Midjourney image where the dial remained untainted by the marbled pattern.
Prompt: luxury wristwatch boasting an intricate dial featuring delicate filigree arabesque designs gracefully intertwine, placing on the background with marble texture
4. Text Generation
We all know that when it comes to embedding legible text within images, Midjourney faces some stumbling blocks. Let’s dive into a couple of my tests:
The Magazine Prompt:
I sought an image depicting a magazine page.
Prompt: Create an elegant perfume ad on a magazine page with a woman in a flowy dress amidst roses, script font saying “Elegance in Every Scent”.
- Midjourney’s Attempt (1st image): Rather than producing the desired magazine layout, the result showcased a perfume bottle sprinkled with illegible text.
- DALL-E 3’s Creation (2nd image): Contrasting sharply, DALL-E 3 nailed the brief. The image echoed the aesthetics of a magazine, complete with the title text almost mirroring my original prompt.
A Further Dive:
Prompt: Whimsical illustration of a cat wearing aviator goggles, piloting a tiny plane, title text saying “Adventure awaits in every corner”
- Midjourney’s Output (1st image): Unfortunately, it seemed as though the tool had decided to craft text from another planet—utterly unrecognizable.
- DALL-E 3’s Render (2nd image): Staying true to form, DALL-E 3 delivered an image with text aligning perfectly with my instructions.
5. The Levitating Act
The ability to depict levitating objects adds a touch of surrealism, and it’s especially magical in domains like food photography. However, achieving this floating magic with Midjourney has been a challenge. Let me walk you through a couple of instances:
Yogi in Flight Scenario:
Wishing for an image of a yogi hovering mid-air, I turned to both tools for their interpretations.
Prompt: Yogi meditating mid-air amidst serene mountain scenery, levitating effortlessly, enveloped by the tranquility of nature
- Midjourney’s Attempt (1st image): Sadly, the yogi remained earthbound. There was no sign of the desired levitation.
- DALL-E 3’s Creation (2nd image): Success! The figure floated gracefully, although it somewhat missed the mark on capturing the yogic essence.
Floating Bananas Scene:
Next, envisioning levitating banana slices over a pristine white plate, I was curious about the results.
Prompt: low angle shot of uniformly cut and evenly spaced banana slices suspending in mid-air, floating banana slices in perfect symmetry above a shallow white dish
- Midjourney’s Representation (1st image): While it managed to depict some degree of floating, the key detail of the white plate went amiss.
- DALL-E 3’s Artistry (2nd image): Nailed it! Not only did the banana slices hover perfectly, but their symmetrical arrangement around the plate was also spot on.
From these tests, it’s clear that when it comes to defying gravity, DALL-E 3 seems to have a better grip on the levitation magic. What are your thoughts on this floating spectacle?
6. Perfecting Layouts
One edge AI drawing tools have over conventional methods is their ability to craft precise arrangements based on simple prompts. Achieving such intricate layouts manually would be a daunting task. Let’s dissect a couple of my layout experiments:
The Lipstick Heart Arrangement:
Here, I envisioned lipsticks forming a heart pattern.
Prompt: bird’s eye view of heart shape arrangement of standing lipsticks
- Midjourney’s Design (1st image): While it did capture the heart shape, there were hiccups. Instead of upright lipsticks, the top section oddly resembled a blend of bullets and stones.
- DALL-E 3’s Representation (2nd image): Precision at its finest! The lipsticks were perfectly arranged as envisioned. If I had to nitpick, the artistic rendering was a tad underwhelming compared to Midjourney.
Honeycomb Chocolate Display:
Venturing into something more intricate, I prompted for chocolates set in a honeycomb layout.
Prompt: bird’s eye view of gourmet chocolates neatly arranged on a rustic wooden background with a geometric honeycomb pattern
- Midjourney’s Creation (1st image): Interesting twist—each chocolate took on a hexagonal shape, embodying the honeycomb aesthetic.
- DALL-E 3’s Craftsmanship (2nd image): DALL-E 3 delivered an array of chocolates, each with varying shapes, beautifully capturing the essence of the prompt.
7. Conveying Motion
When it comes to encapsulating movement, the distinction between Midjourney and DALL-E 3 is palpable.
The Motion Blur Experiment:
I set out with a prompt that specifically asked for an image illustrating motion blur.
Prompt: a skateboarder’s sneakers in mid-air, motion blur
- Midjourney’s Interpretation (1st image): Instead of the expected blur, the tool generated an image with a more ‘dusty’ feel, which didn’t convey the sense of speed I was aiming for.
- DALL-E 3’s Depiction (2nd image): DALL-E 3 got closer to the mark, showcasing the motion through both blur and distinct lines. However, while it nailed the concept, the overall artistic impression didn’t quite rival that of Midjourney.
Interestingly, Midjourney isn’t inherently inept at depicting motion. A few prompt tweaks, such as integrating terms like “speed line”, can nudge it in the right direction. And, true to form, when I incorporated such tweaks, Midjourney produced an image that brilliantly captured the essence of motion blur.
Prompt: speed lines of a mechanical gaming chair in motion blur, gaming setup background, ergonomic design, immersive gaming experience, close-up shot
8. The Realm of Surrealism
For those who’ve dabbled with DALL-E 3, its capabilities might seem almost boundless. It crafts images that feel like they’re plucked straight from a dream, bending reality in the most imaginative ways. On the flip side, Midjourney tends to anchor its creations in realism, producing images that resonate more with our everyday world.
To put this into perspective, let’s consider a couple of examples:
Prompt: a photo of a camel running at full speed through a desert landscape, kicking up swirls of dust, a woman sitting on its back typing on the keyboard of a laptop open in front of her
Prompt: a photograph of a young boy joyfully riding an electric guitar like a surfboard as it soars through a blue sky, musical notes floating around him
The Fusion Experiment: Marrying Midjourney and DALL-E 3
From our previous deep-dives, it’s evident: while Midjourney brings artistic realism to the table, DALL-E 3 flaunts an uncanny ability to translate language nuances into imagery. So, the burning question: can we blend the two, combining Midjourney’s artistry with DALL-E 3’s linguistic prowess?
Intrigued, I initiated an experiment. I took images crafted by DALL-E 3 and fed them into Midjourney as the reference image, using identical prompts. The goal is to check if Midjourney could refine DALL-E 3’s creations, enhancing their artistry.
But here’s the twist: the results were less than stellar. The vast gulf in the capabilities of the two tools became more evident. Only a handful of images showed improvement:
- Banana slices not only dangled more abundantly but the plate morphed to a pristine white.
- Another surprise? A figure, initially depicted as a woman atop a camel, transitioned into a male figure after several tweaks and inpaintings.
It seems combining the strengths of both tools isn’t as straightforward as hoped. But then again, each tool, in its own right, brings a unique touch to the AI imagery canvas.
Delving into the Why
Ever wonder why there’s such a stark contrast between Midjourney and DALL-E 3? The answer lies in their foundations: the neural networks they’re built upon.
Midjourney’s Mechanism: This tool harnesses the power of the Diffusion model. You know the process: you start with a hazy, indistinct image, and as time goes on, it evolves, becoming clearer with each step. This process involves gradually sculpting random noise into coherent shapes and scenes. As the image sharpens, the model factors in textual descriptions, refining details to achieve a realistic and artistic outcome. It’s like watching a painter gradually bring a canvas to life. The catch? This method demands more time and computational grunt.
DALL-E 3’s Magic: Enter the world of the Transformers model. This model’s strength? Deciphering natural human language with finesse. It crafts images in a snap, producing visuals that align closely with textual prompts. Unlike Midjourney’s step-by-step approach, DALL-E 3’s creations are instantaneous. While this means faster results, there’s a trade-off: the images might sometimes lack realism, especially if the provided prompt is vague. However, its agility enables the fusion of diverse concepts, styles, and attributes in unique ways.
For the tech aficionados out there, I’m no expert. If you spot any nuances I’ve missed, your insights are always welcome!
In my opinion, while DALL-E 3 shines in user-friendliness and integration with platforms like ChatGPT, it’s not about to overshadow Midjourney. Midjourney, with its rich artistic touch, caters to more professional and aesthetic-intensive projects. Though it has a steeper learning curve, its quality is undeniable.
So, where does your allegiance lie in this AI duel? I’d love to hear your thoughts!