Midjourney vs DALL-E vs Stable Diffusion: Which One Nails the Human Pose?

The recent release of Midjourney V6 has sparked a wave of excitement, its hyper-realistic images almost making traditional photography seem obsolete.

However, those who have dabbled in the realm of AI art creation are familiar with a persistent shortcoming that isn’t disappearing anytime soon.

Today, let’s embark on an exploratory project, putting the leading AI image generators (Midjourney, DALL-E, and Stable Diffusion) to the test with a specific challenge: replicating a particular yoga pose from a photograph sourced from the Unsplash image library.

Table of Contents

DALL-E’s Attempt

Kicking off this experiment, we turn to DALL-E. To capture every nuance of the yoga pose, I crafted a detailed prompt:

Prompt: Wide photos of an athletic Asian female performing a one-legged wheel yoga pose with one leg extended up towards the ceiling. Her back is deeply arched, forming a semi-circular wheel shape. Both her arms are straight with hands planted firmly on the ground beneath the shoulders, fingers spread wide for stability. One leg is extended straight upwards, pointing towards the ceiling, while the foot of the other leg remains on the ground, helping to maintain balance. Her hips are lifted high, contributing to the overall curvature of the body. Her head is gently dropped back, in line with the arch of the back, without straining the neck. She is wearing a fitted blue outfit in a spacious, white, minimalist corridor with a series of arches receding into the distance. Each arch boasts a clean, semi-circular design that rises from a strong rectangular base. The neoclassical style of the arches is unadorned, with no visible embellishments or intricate moldings.

DALL-E’s results, after a few tries, were remarkable in replicating the pose with near-perfect accuracy.

Yet, the overall image lacked depth and texture, and the subject’s movement felt somewhat artificial.

Midjourney’s Interpretation

Next, let’s see how Midjourney V6 fares:

Repeated attempts with Midjourney resulted in motion distortion—a problem less pronounced in DALL-E’s output. Even when using the Unsplash image as a reference, Midjourney struggled to match the standard.

However, Midjourney excelled in capturing the scene’s essence and imbued the lighting with a captivating ambiance.

Stable Diffusion’s Approach

Stable Diffusion, admittedly weaker in understanding natural language, requires prompts akin to those for Midjourney v5.2. Solely relying on text prompts for yoga pose images leads to even more pronounced distortions than with Midjourney.

However, Stable Diffusion boasts a saving grace: the ControlNet extension. This feature allows me to use Unsplash images as a reference within Stable Diffusion. The combination of Canny and Depth allows for more precise control over the composition and depth, respectively.

With some tweaks to the prompt, I altered the color of the yoga outfit and transformed the concrete floor into wood. Some inpainting was necessary to refine the subject’s facial features, though minor distortions remained.

For a more natural look, I merged the face and hands from the original Unsplash photo using Photoshop, culminating in the following image.

yoga pose by stable diffusion and photoshop

This final image has a slightly reddish tint and could benefit from further color correction for added detail.

Concluding Thoughts

While Midjourney and Stable Diffusion show potential for mimicking reality, they still require auxiliary tools like Photoshop to iron out imperfections.

Nonetheless, the rapid evolution of AI in this field is staggering, with numerous successful applications already in motion. It’s an exciting time to delve into the possibilities of AI art creation.

Lighting Change Magic: Transforming Images with Stable Diffusion ControlNet Brightness

ByWei Mao January 4, 2024January 9, 2025

The advent of Stable Diffusion marked a pivotal shift in the realm of photo editing, particularly in manipulating lighting effects. Prior to this innovation, photo lighting alterations predominantly relied on tools like Photoshop, especially the use of an exposure layer. This method excelled with objects of simple shapes, adeptly mimicking lighting effects. However, the challenge…

A1111

Face Swapping with Stable Diffusion Latest Model in A1111: IP-Adapter Face ID Plus V2

ByWei Mao February 3, 2024October 13, 2024

In the realm of AI art generation, Stable Diffusion’s A1111, augmented by its powerful extension ControlNet, emerges as a pinnacle of innovation and control. Among its array of control types, the IP Adapter shines as a beacon of versatility. Not only does it facilitate operations similar to Midjourney’s “Image Reference” but it also excels in…

A1111

How to Train a Highly Convincing Real-Life LoRA Model

ByWei Mao March 22, 2024January 9, 2025

In this post, we’ll delve into the nuances of training a LoRA model that seamlessly integrates a beloved personality into any scenario, ensuring remarkable consistency. Through LoRA, we can craft incredibly lifelike portraits, as showcased by the LoRA model I developed featuring Scarlett Johansson. So, let’s embark on the journey of mastering LoRA training. For…

A1111

Stable Diffusion Magic: Effortlessly Swapping Models in Online Retail with Inpaint Anything Extension

ByWei Mao December 26, 2023October 13, 2024

Stable Diffusion, a groundbreaking tool, is transforming how designers approach their workflow due to its unparalleled controllability. Let’s dive into the world of e-commerce. Before the emergence of AI tools like this, the process of shooting outfits was not only costly but also time-consuming, often involving professional models billed by the hour. Now, imagine a…

A1111

From White to Wow: Crafting Captivating Scenes with Stable Diffusion

ByWei Mao December 30, 2023October 13, 2024

Stable Diffusion is rapidly gaining acclaim as the premier AI tool for image, especially lauded for its exceptional control capabilities. In the dynamic realm of e-commerce, creating product photos can be tough and time-consuming. It often means you have to hire professionals, rent a studio, buy expensive equipment, and edit images. But even with all…

A1111

Enhancing Clothing Details with DeepFashion (ADetailer) in Stable Diffusion

ByWei Mao January 25, 2024January 9, 2025

In my previous article, I explored the fascinating world of ADetailer, a powerful extension for Stable Diffusion. Primarily focused on refining facial features and hands, ADetailer encompasses 14 distinct models, each serving a unique function. While I have delved into most of these models, one, in particular, stands out due to its unique capabilities –…

DALL-E’s Attempt

Midjourney’s Interpretation

Stable Diffusion’s Approach

Concluding Thoughts

Similar Posts

Leave a Reply Cancel reply