As we nestled into the holiday season, the unexpected release of the Midjourney V6 Beta took the AI community by surprise.
This move, possibly spurred by the rising competition in the AI realm, comes at a time when Midjourney was already soaring high, with no rivals like Adobe Firefly or DALL-E 3 in sight.
Today, however, even giants like Google and Meta are swiftly bridging the gap in the text-to-graphics arena.
The V6 update marks a significant leap, especially from a beta perspective. The release notes reveal a dramatic shift in how prompts are processed, differing vastly from V5. This change suggests users might need to relearn the art of prompt crafting.
Midjourney’s latest iteration focuses on three core enhancements: 1) elevating photorealism, 2) enhancing semantic comprehension, and 3) bolstering text generation capabilities.
In this exploration, I dive into two key areas: photorealistic portraits and text generation. Many of my prompts, crafted in natural language, offer a window into V6’s proficiency in understanding everyday speech.
A common critique of V5.2 was its tendency to produce portraits with unnaturally smooth skin, lacking the textured realism often seen in Stable Diffusion’s outputs. My tests with V6 suggest a noteworthy advancement in realism. Let’s delve into a few examples:
Prompt: lensbaby shot of young woman in a meadow, swirling bokeh background, ethereal lighting casting soft shadows on her face, dreamlike atmosphere –s 750 –style raw
Prompt: outdoor portrait of a woman in a wheat field at sunset –s 250 –style raw
Prompt: Polaroid camera photo of femme fatale, red lipstick, black veil, dim-lit bar, mysterious and dangerous aura –s 750 –style raw
In these prompts, all depicting portraits within scenes, we notice a stark contrast.
While V5.2 tends to give more weight to the scene, V6 leans towards a more headshot-centric approach. However, V6’s realism sometimes feels overdone, as seen in the excessive freckles on the subjects.
Now, examining a close-up image:
Prompt: close up woman face portrait, glossy blue eyes, side ligting, haute couture, ultra detailed, tilt shift –s 750 –style raw
Here, the differences are striking. In the V6 images, even minute details like bloodshot eyes are vividly captured.
In summary, while V6 excels in character detail, it arguably strays into overkill territory.
The shift in prompt interpretation is also significant, with V5.2 and V6 rendering the same prompts with notable differences.
V6’s hyper-realistic approach seems to compromise aesthetic nuances, like the specific effects of lensbaby and Polaroid cameras I mentioned in the prompts. The over-emphasized blood in the eye in the third image also falls short of expectations.
It’s worth noting that V6, still in its beta – or rather, alpha stage – is a work in progress. Midjourney has indicated that further updates could arrive unannounced, potentially addressing the issues I’ve highlighted.
For a deeper dive into Midjourney’s realistic style, feel free to peruse my detailed article:
Despite these critiques, V6 has its bright spots, particularly in text generation. Since DALL-E 3’s debut, I’ve often preferred it for creating featured images of my articles, primarily due to its superior text rendering accuracy. With V6, Midjourney shows promise in this domain.
V6’s text generation capabilities have seen substantial improvements. To achieve more precise text rendering, Midjourney recommends using a lower Style parameter, like
-s 50 and
--style raw, though for this test, I used the default
Consider these examples:
Prompt: Pastel Drawing: A soft pastel rendering of a field of flowers, their delicate petals swaying in a gentle breeze, with the text “Serenity” overlaid, prominent and aligns with the image’s style. –style raw
Prompt: Photo of a cyberpunk street scene with futuristic neon advertisements and flying cars. The text “Future” inside a semi-transparent box is prominent and aligns with the image’s style. –style raw
Prompt: Craft a fantasy world filled with floating islands, each inhabited by a different mythical creature. The text “fantasy” inside a bubble is prominent. –style raw
Prompt: Design a tall book cover about a space battle. The metal text “epic” is prominent. –ar 2:3 –style raw
Prompt: Create a heartwarming tall greeting card featuring a fluffy teddy bear hugging a bouquet of colorful balloons in a garden filled with vibrant flowers. The embossed text “thank you” is prominent. –ar 2:3 –style raw
The prompt I’ve used here is borrowed from one of my previous articles, focusing on the unique text effects achievable with DALL-E.
In this instance, the language carries a distinct DALL-E flavor, yet Midjourney demonstrates a commendable understanding. It effectively renders the text, even incorporating some of the special effects I specified. However, it’s important to note that while Midjourney makes significant strides, it doesn’t quite reach the nuanced proficiency of DALL-E just yet.
The transition from V5.2 to V6 underscores Midjourney’s commitment to continuous evolution and improvement. As we eagerly anticipate its stabilization and the advent of a web version, it’s clear that Midjourney is more than ready to meet the challenges of an ever-advancing AI landscape.