GPT-4 vs. Google Gemini Pro: An In-Depth Comparison!

Google has just dropped their latest tech marvel, the Gemini Pro, and it’s making waves! Hold tight, because the even more powerful Gemini Ultra is on the horizon.

Let me tell you, the tech world is buzzing with comparisons. First, Claude 2 outdid GPT-4, then Google Bard, and now, Google Gemini seems to be leading the pack. This got me thinking – is Gemini really outperforming GPT-4?

Bloggers are abuzz, pointing at the table below, claiming Gemini has outdone GPT-4. It’s based on a benchmark that’s widely respected, and the numbers don’t lie. Check it out:


But here’s a little secret about benchmarks – they’re not foolproof. Imagine if a language model is trained with the same data it’s tested on. It’s like having a sneak peek at the exam questions. If you study just those, you’re bound to do well, right? That doesn’t mean you’re smarter, just better prepared.

So, while I’m not accusing Google of any tricks, we should think twice before taking these benchmarks at face value.

Now, let’s get down to business and compare Google’s Gemini Pro with GPT-4, looking at various aspects. And hey, if you haven’t tried Gemini Pro yet, it’s free! Check it out here: Plus, it hooks right into Google Bard. Let’s see how they stack up!

Token Limit

Token capacity is a big deal in the world of large language models. Simply put, it’s how many words a model can remember during a conversation. The higher the token limit, the longer and more detailed a conversation can be.

Imagine asking a model to summarize a book. The more words it can handle at once, the more comprehensive the summary. In ongoing conversations, a higher token limit means the model remembers better, making the chat more coherent.

In my tests, Gemini Pro’s token limit was notably less than GPT-4’s. For instance, I asked it to turn a 45-minute talk from Sam Altman at the OpenAI DevDay into an article. Gemini Pro tapped out at the 30-minute mark.

google gemini token limit 2

The total word count from Gemini Pro was around 5327 words.

google gemini token limit 1

Plus, it offers a cool feature where you get three different responses to choose from, pushing the total word count to about 6000.

google gemini other drafts

ChatGPT-4, however, handled it like a champ. I fed it subtitles from two 45-minute videos, and it converted them without breaking a sweat, hitting around 17,408 words.

gpt4 token limit 1

And remember, GPT-4 Turbo can handle up to an incredible 128k in its context window!

Long Text Summarization

When it comes to text summarization, let’s see how Gemini Pro stacks up against GPT-4.

My task was to turn YouTube subtitles into an article. Gemini Pro opted for a straightforward summary of the subtitles, as shown in the image below.

google gemini summary

On the other hand, ChatGPT-4’s output, while shorter, boasted a clear, well-structured format.

gpt4 summary

Here’s another twist: Gemini Pro can actually summarize YouTube video content directly. Not too surprising, considering Google owns YouTube. The summary quality is pretty decent.

google gemini sumarize youtube video

But GPT-4, especially with its VoxScript plugin, seems to have an edge in summarizing YouTube videos. Its summaries have a beginning, a structured middle, and an end, making them more cohesive.

gpt4 summarize youtube video

Pulling out essential information from lengthy texts is a real test for any LLM. So far, Gemini Pro lags a bit behind GPT-4 in this arena. That said, Gemini Pro’s knack for summarizing YouTube content is a handy feature, something that might remain unique due to OpenAI’s lack of direct access to YouTube.

Knowledge Update

In November, OpenAI gave ChatGPT a significant update, extending its training data up to April 2023. They’ve also planned to keep this data fresh with ongoing updates. When I quizzed Gemini Pro about its last update, it played coy, labeling it “confidential.” google gemini training data last updated

Curiosity led me to test their knowledge on a newer topic: creating your own GPTs in ChatGPT. Gemini Pro responded well, but it left me wondering whether it drew from its own training data or relied on data from the Internet.

ask google gemini what is custom gpt

In contrast, GPT-4’s training data, as the below image shows, doesn’t cover how to create GPTs.

ask gpt 4 what is custom gpts 1

The game changer? Networking. Once I enabled GPT-4’s networking feature, it delivered a more detailed and accurate response to the same question, outperforming Gemini Pro in this aspect.

ask gpt 4 what is custom gpts 2


Evaluating a large language model’s human-like thinking starts with logic problems. Take this one, for example:

Can Jack’s move from Class A to Class B lead to an increase in the average IQ of both classes? Explain why or why not.

The solution? Just consider the case where Jack’s IQ is lower than class A’s average but higher than class B’s. The answer, quite simply, is yes.

Now, let’s see how ChatGPT-4 fared:

gpt 4 logic test 1 GPT-4 hit the nail on the head with a perfect answer.

Gemini Pro’s attempts, however, missed the mark. Here’s one of its better tries:

google gemini logic question 2

Moving on to general knowledge, drawn from Yejin Choi’s TED talk, we find another illustrative example. The question is straightforward:

I have a 12-liter jug and a six-liter jug, and I want to measure six liters. How do I do it?

The solution is just as simple: you can make 6 liters in a 6-liter pot.

GPT-4’s response might be a bit wordy, but it gets there:

gpt 4 logic test 2

Gemini Pro, however, takes a longer route:

google gemeni logic test 2

Here’s another general knowledge question:

I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes.

The answer is commonly known – it takes only 5 hours.

Gemini Pro’s response was off, even using Python to do the math.

google gemeni logic test 3

ChatGPT-4, on the other hand, provided the correct answer:

gpt4 logic test 3

This comparison reveals a notable gap in reasoning and common sense between Gemini Pro and GPT-4.

Math Skills

Math skills are a crucial part of an LLM’s reasoning capability. Take this indefinite calculus question, for instance. Originating from the University of California, Berkeley, this question has four correct answers.

Gemini Pro’s attempts varied. Its first draft missed one answer.

google gemini math test 1

Then, Gemini Pro Draft 2 interestingly included an extra answer.

google gemini math test 2

However, its third attempt was entirely off the mark.

In contrast, GPT-4 showcased its prowess by providing the correct answer.

gpt4 math test 1

This exercise clearly demonstrates GPT-4’s superior mathematical and reasoning abilities compared to Gemini Pro.

Networked Search Capabilities

Gemini Pro really impressed me with its skills in complex networked search tasks! I put it to the test with a specific request:

Prompt: Research sustainable packaging solutions for a fictitious consumer goods company. Identify the top 3 sustainable packaging suppliers and provide a report on their offerings, pricing, and environmental impact.

And Gemini Pro nailed it.

google gemini networking capability

What stood out was its ability to summarize information into tables that can be easily downloaded to Google Sheets. Super handy!

google gemini export to google sheet

On the other hand, GPT-4’s native networking abilities fell short in this area. Even with the WebPilot plugin, GPT-4 couldn’t match Gemini Pro’s performance, lacking in table generation and access to certain data like prices.


This aspect of Gemini Pro, powered by Google’s robust search engine, poses a real challenge for GPT-4. It’s a wake-up call for Microsoft and OpenAI to up their game in networked search capabilities.

Vision Capabilities

Gemini Pro, like GPT-4, is a multimodal model with impressive capabilities, including image recognition. I put this to the test using images and questions from the Paper “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).”

When presented with a particular image and asked about its humorous aspect, Gemini Pro’s response was off the mark.

gemini pro image recognition 1

It incorrectly identified a rhino horn as a turtle, missing the actual humor in the image.

In contrast, GPT-4 demonstrated a precise understanding of the image’s content and meaning.

gpt 4 image recognition 1

I also tested their ability to interpret another image, asking them to describe its content, the Chinese characters, and identify the food’s city of origin.

Gemini Pro’s answer was:

gemini image recognition 2

While GPT-4 provided this response:

gpt4 image recognition 2

This comparison suggests that while Gemini Pro may have stronger multilingual capabilities, GPT-4 shows a slightly better grasp of graphic literacy, understanding and interpreting images more accurately.


Gemini Pro marks a significant leap forward from its predecessor, Bard. While Bard was hardly my go-to choice (except for the occasional map reading), Gemini Pro has brought much more to the table. Although it doesn’t quite reach the heights of GPT-4, it certainly outperforms GPT 3.5 in several key areas.

The burning question is: how will Gemini Ultra stack up against GPT-4? My bet is that it’ll still trail slightly behind GPT-4.

For users, the availability of a free tool like Gemini, alongside ChatGPT and Claude 2, is definitely a win. It’s an exciting time in the world of AI, with each new development offering more versatility and utility.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *