Midjourney vs DALL-E vs Stable Diffusion: Which One Nails the Human Pose?

The recent release of Midjourney V6 has sparked a wave of excitement, its hyper-realistic images almost making traditional photography seem obsolete.

However, those who have dabbled in the realm of AI art creation are familiar with a persistent shortcoming that isn’t disappearing anytime soon.

Today, let’s embark on an exploratory project, putting the leading AI image generators (Midjourney, DALL-E, and Stable Diffusion) to the test with a specific challenge: replicating a particular yoga pose from a photograph sourced from the Unsplash image library.

Table of Contents

DALL-E’s Attempt

Kicking off this experiment, we turn to DALL-E. To capture every nuance of the yoga pose, I crafted a detailed prompt:

Prompt: Wide photos of an athletic Asian female performing a one-legged wheel yoga pose with one leg extended up towards the ceiling. Her back is deeply arched, forming a semi-circular wheel shape. Both her arms are straight with hands planted firmly on the ground beneath the shoulders, fingers spread wide for stability. One leg is extended straight upwards, pointing towards the ceiling, while the foot of the other leg remains on the ground, helping to maintain balance. Her hips are lifted high, contributing to the overall curvature of the body. Her head is gently dropped back, in line with the arch of the back, without straining the neck. She is wearing a fitted blue outfit in a spacious, white, minimalist corridor with a series of arches receding into the distance. Each arch boasts a clean, semi-circular design that rises from a strong rectangular base. The neoclassical style of the arches is unadorned, with no visible embellishments or intricate moldings.

DALL-E’s results, after a few tries, were remarkable in replicating the pose with near-perfect accuracy.

Yet, the overall image lacked depth and texture, and the subject’s movement felt somewhat artificial.

Midjourney’s Interpretation

Next, let’s see how Midjourney V6 fares:

Repeated attempts with Midjourney resulted in motion distortion—a problem less pronounced in DALL-E’s output. Even when using the Unsplash image as a reference, Midjourney struggled to match the standard.

However, Midjourney excelled in capturing the scene’s essence and imbued the lighting with a captivating ambiance.

Stable Diffusion’s Approach

Stable Diffusion, admittedly weaker in understanding natural language, requires prompts akin to those for Midjourney v5.2. Solely relying on text prompts for yoga pose images leads to even more pronounced distortions than with Midjourney.

However, Stable Diffusion boasts a saving grace: the ControlNet extension. This feature allows me to use Unsplash images as a reference within Stable Diffusion. The combination of Canny and Depth allows for more precise control over the composition and depth, respectively.

With some tweaks to the prompt, I altered the color of the yoga outfit and transformed the concrete floor into wood. Some inpainting was necessary to refine the subject’s facial features, though minor distortions remained.

For a more natural look, I merged the face and hands from the original Unsplash photo using Photoshop, culminating in the following image.

yoga pose by stable diffusion and photoshop

This final image has a slightly reddish tint and could benefit from further color correction for added detail.

Concluding Thoughts

While Midjourney and Stable Diffusion show potential for mimicking reality, they still require auxiliary tools like Photoshop to iron out imperfections.

Nonetheless, the rapid evolution of AI in this field is staggering, with numerous successful applications already in motion. It’s an exciting time to delve into the possibilities of AI art creation.

IP-Adapter’s New Model Surpasses Roop and Reactor: Mastering Character Consistency

ByWei Mao February 12, 2024February 12, 2024

Building on our exploration of IP-Adapter’s groundbreaking Face ID Plus V2 model, this piece ventures further into the realm of creativity. Today, we aim to uncover the depths of how this formidable instrument empowers us to craft highly personalized character portraits. With its unique ability to weave together a diverse tapestry of styles while preserving…

Stable Diffusion

How to Train a Highly Convincing Real-Life LoRA Model

ByWei Mao March 22, 2024March 22, 2024

In this post, we’ll delve into the nuances of training a LoRA model that seamlessly integrates a beloved personality into any scenario, ensuring remarkable consistency. Through LoRA, we can craft incredibly lifelike portraits, as showcased by the LoRA model I developed featuring Scarlett Johansson. So, let’s embark on the journey of mastering LoRA training. For…

Stable Diffusion

Stable Diffusion Magic: Effortlessly Swapping Models in Online Retail with Inpaint Anything Extension

ByWei Mao December 26, 2023December 26, 2023

Stable Diffusion, a groundbreaking tool, is transforming how designers approach their workflow due to its unparalleled controllability. Let’s dive into the world of e-commerce. Before the emergence of AI tools like this, the process of shooting outfits was not only costly but also time-consuming, often involving professional models billed by the hour. Now, imagine a…

Stable Diffusion

Refining Midjourney Images with Stable Diffusion: A Step-by-Step Guide

ByWei Mao January 3, 2024January 3, 2024

Midjourney V6 has ushered in a significant advancement in texture quality for generated photos. The four images below, crafted by Midjourney V6, demonstrate this leap in detail, especially when scrutinizing the faces of people and animals like tigers. A meticulous examination, however, reveals flaws in at least two hands among these images. This issue isn’t…

Stable Diffusion

Enhancing Clothing Details with DeepFashion (ADetailer) in Stable Diffusion

ByWei Mao January 25, 2024January 25, 2024

In my previous article, I explored the fascinating world of ADetailer, a powerful extension for Stable Diffusion. 👉 Mastering ADetailer (After Detailer) in Stable Diffusion Primarily focused on refining facial features and hands, ADetailer encompasses 14 distinct models, each serving a unique function. While I have delved into most of these models, one, in particular,…

Stable Diffusion

Unlock Midjourney’s Artistic Magic Using Stable Diffusion

ByWei Mao July 3, 2024July 3, 2024

In this tutorial, we’ll delve into using various LoRAs (Low-Rank Adaptations) to bring the artistic flair of Midjourney to images generated by Stable Diffusion. By combining these LoRAs, you can achieve a variety of artistic effects. We’ll build workflows in ComfyUI to combine these LoRAs, but you can also implement them in A1111 or Fooocus….

DALL-E’s Attempt

Midjourney’s Interpretation

Stable Diffusion’s Approach

Concluding Thoughts

Similar Posts

Leave a Reply Cancel reply