FLUX vs. Midjourney: A Deep Dive into Typography, Anatomy, Prompt Following and more

The landscape of generative art is constantly evolving, and the recent release of FLUX by Black Forest Labs has sparked significant interest among creators. FLUX is an open-source text-to-image suite of models that, according to official evaluations, outperforms many of the leading models in the field, both open-source and proprietary—including the popular Midjourney.

As someone who has extensively explored and documented the capabilities of Midjourney, I was intrigued by the claims surrounding FLUX. With a healthy dose of skepticism, I decided to put FLUX to the test, comparing it directly with Midjourney across several key dimensions. This article details my findings, offering a comprehensive comparison of these two powerful tools in the world of generative art.

For those who love diving into Stable Diffusion with video content, you’re invited to check out the engaging video tutorial that complements this article:

Text Generation

Let’s begin by examining the text generation capabilities of FLUX and Midjourney. To illustrate the differences, consider 2 sets of images: on the left, we have an image generated by FLUX Dev, and on the right, one from Midjourney V6.1.

1. Text Handling and Texture

Both models demonstrate proficiency in handling simple word spelling. However, when it comes to textural detail, FLUX appears to have the upper hand. For example, the word “HEAL” generated by Midjourney displays a cookie-like texture, which doesn’t quite align with the intended fruity appearance. In contrast, FLUX delivers a more appropriate texture, making the text look more realistic and true to the prompt.

The difference in quality becomes even more apparent when we look at images involving text overlays, such as the example with ice cubes. The text generated by FLUX stands out for its clarity—the outline around the word “Cubes” is sharp and visually appealing, whereas Midjourney’s version lacks this level of precision.

2. Aspect Ratio Flexibility

Another significant advantage of FLUX is its support for various aspect ratios, a feature that adds a layer of greater flexibility than Midjourney. For instance, at a 1:1 aspect ratio, FLUX generates smaller text with tighter spacing, ensuring that all five letters are perfectly visible from a front-facing view. Midjourney, however, struggles with this—its letters appear crowded and are not shown from a straight-on perspective.

Conclusion

Overall, FLUX demonstrates superior typography capabilities, even surpassing Midjourney in this area. Its ability to handle different aspect ratios and produce high-quality, well-textured text sets it apart as a more versatile and powerful tool for text generation in generative art.

Anatomy

Next, let’s explore how FLUX and Midjourney handle the complex task of generating human bodies—a challenge that has long plagued AI models with issues like distorted limbs and unnatural body parts.

1️⃣ Anatomical Accuracy

One of the significant improvements seen in FLUX is its ability to generate human figures without the glaring errors often found in earlier models like Stable Diffusion 3. In the images generated by FLUX, the human body is depicted with a high level of accuracy, with no major anatomical errors. However, there are still some areas where the AI’s influence is noticeable, such as the overly pronounced muscles. These bulging muscles can appear slightly unrealistic, and those familiar with muscle structure can easily identify that the image is AI-generated.

Midjourney, on the other hand, produces muscle structures that appear slightly more realistic, particularly when details like sweat are added. This gives the bodybuilder in the Midjourney image a more lifelike appearance, contributing to the overall realism.

2️⃣ Spatial Relationships

While Midjourney may have an edge in muscle realism, it struggles significantly with spatial relationships. For example, in one image, the barbell appears to pass through the woman’s head—an obvious flaw that breaks the immersion of the generated scene. FLUX, in contrast, excels in this area. Thanks to its flexible aspect ratio support, FLUX accurately portrays spatial relationships, allowing the entire barbell to be shown without crowding the image or introducing awkward errors.

3️⃣ Movement and Poses: Ballerinas and Yoga

When comparing how these models handle dynamic poses, such as those of ballerinas and yoga practitioners, both FLUX and Midjourney perform admirably, though with some differences.

  • Ballerinas: The images of ballerinas generated by both models are quite similar, with only minor issues in each. However, Midjourney’s version, while slightly more flawed, does offer a more dramatic aesthetic, particularly through its use of light and shadows.
  • Yoga Poses: Moving on to yoga poses, FLUX demonstrates exceptional accuracy, nailing the complex positions almost perfectly. Midjourney also does a solid job.

Conclusion

In terms of generating human bodies, FLUX and Midjourney are nearly neck and neck. FLUX excels in spatial relationships and the accurate depiction of dynamic poses, while Midjourney offers a slight advantage in rendering realistic muscle structures and dramatic lighting. Overall, both models show significant strengths, making them formidable tools for generating human figures in AI art.

Interaction/Prompt Following

Next, let’s delve into the concept of “interaction”—the ability of these models to depict natural interactions between people and objects, or how accurately they follow the given prompts. This is a critical aspect of generative art, especially when dealing with complex scenes that require a nuanced understanding of spatial relationships and human emotions.

1️⃣ Handling Complex Angles and Natural Interactions

Consider a scenario where a little boy is looking over his shoulder—a challenging angle for any model to capture. In this instance, FLUX outperforms Midjourney. While the butterfly in FLUX’s image doesn’t land exactly on the boy’s shoulder as the prompt might suggest, it does land on his arm, creating a more natural and believable interaction than what Midjourney produces. FLUX also excels in capturing the little boy’s gaze, perfectly conveying a sense of wonder and curiosity.

2️⃣ Prompt Adherence and Expression

In another set of images, Midjourney demonstrates a stronger adherence to the prompt, almost replicating it exactly. In contrast, FLUX falls slightly short. For instance, in a scene where a man is supposed to express surprise, the man’s expression in FLUX’s image lacks the intensity or clarity of emotion that you would expect from the prompt. Here, Midjourney’s attention to detail, particularly in facial expressions, results in a more convincing and prompt-aligned image.

3️⃣ Image Quality and Realism

When it comes to overall image quality, Midjourney generally produces more realistic results, especially in terms of skin texture and detail. The skin in Midjourney’s images tends to look more natural, with subtle details that enhance realism. On the other hand, FLUX sometimes produces skin that has a slightly plastic-like appearance, detracting from the overall realism of the image.

Conclusion

Despite some of its shortcomings, FLUX manages to outperform Midjourney in certain instances, particularly in creating natural interactions and capturing complex angles. While Midjourney might win in terms of image quality and prompt adherence in specific cases, FLUX’s ability to handle challenging prompts with nuanced interaction is quite impressive. This makes FLUX a compelling option for scenarios where capturing the essence of interaction is critical, even if it doesn’t always win on every front.

Hands

Let’s discuss one of the most notorious challenges in generative art: creating realistic hands. Many AI models struggle with this, often producing hands that look distorted or unnatural. Let’s examine how FLUX and Midjourney perform in this area.

1️⃣ Realism in Hand Generation

FLUX stands out for its ability to generate hands that look impressively realistic. In particular, when dealing with challenging angles, such as a left hand positioned at an unusual angle, FLUX manages to maintain anatomical accuracy and natural appearance. This level of detail is crucial for creating convincing images.

On the other hand, Midjourney has some noticeable issues when it comes to hand generation. Even in some of its better examples, Midjourney struggles with details like nail caps, especially on the ring and pinky fingers. These inaccuracies can be distracting and reduce the overall realism of the image.

2️⃣ Consistency Across Scenarios

The problem with Midjourney’s hand generation is not just a one-off occurrence; it’s a consistent issue. For example, in a set of images depicting someone playing the piano—a scenario where hand accuracy is critical—Midjourney once again falls short. The hands it generates often appear awkward or incorrect, disrupting the visual integrity of the scene. In contrast, FLUX handles this task with greater precision, producing hands that look appropriate and natural within the context of the image.

Conclusion

When it comes to generating hands, FLUX clearly outperforms Midjourney. Whether it’s the anatomical accuracy at unusual angles or the consistency across different scenarios, FLUX shows a superior capability in handling one of the most challenging aspects of AI-generated images. For creators who need reliable and realistic hand depictions, FLUX is the better choice.

Face Generation

Generating realistic faces is another critical test for AI models, and it’s an area where subtle details like skin texture and tone can make all the difference. Let’s see how FLUX and Midjourney stack up in this department.

1️⃣ Skin Texture and Realism

When it comes to skin texture, Midjourney has a clear advantage. The faces it generates tend to look more realistic, with skin textures that capture a lifelike quality. This realism is further enhanced by Midjourney’s ability to reflect warm tones from the prompt, creating a more natural and convincing appearance.

In contrast, FLUX struggles in this area. The skin in FLUX-generated faces often appears greasy or plastic-like, detracting from the overall realism of the image. This issue is particularly noticeable when compared to Midjourney and even Stable Diffusion 3 Medium. Remarkably, Stable Diffusion 3 Medium can sometimes surpass both FLUX and Midjourney in terms of skin texture quality, producing faces with more nuanced and natural textures.

2️⃣ Handling Tears and Emotional Details

While FLUX falls short in skin texture, it does manage to keep up with Midjourney in some aspects of facial detail, such as generating tears. Both FLUX and Midjourney are capable of showing tears when prompted—an achievement not all models can claim.

For instance, in comparisons with Juggernaut XL and Stable Diffusion 3 Medium, neither of those models successfully captured tears, highlighting an area where FLUX and Midjourney do perform well.

Conclusion

In the realm of face generation, Midjourney outshines FLUX primarily due to its superior skin texture and ability to reflect warm tones more accurately from the prompt. FLUX, while capable of handling certain facial details like tears, consistently falls behind in delivering realistic and natural-looking faces. For creators who prioritize facial realism, especially in terms of skin texture, Midjourney remains the stronger choice.

Final Thoughts

After thoroughly reviewing these images, it’s clear that FLUX, as an open-source model, is making significant strides in generative art. It surpasses Midjourney in several areas, such as prompt following and hand generation, demonstrating the potential of open-source models. While FLUX currently falls short in overall image quality, its open-source nature allows it to collaborate with other models, potentially leading to rapid improvements.

The buzz surrounding FLUX suggests that we can expect to see even more refined models based on this technology in the near future. I’ll continue to update you with tutorials on FLUX, so if you haven’t followed me yet, now is a great time to do so!

Stay tuned, and keep creating!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *