Google Gemini: An Overview of the New Google AI

About this guide

On December 6, 2023, Google unveiled their long-awaited new artificial intelligence Gemini. In doing so, they are adding a serious competitor just under a year after the successful release of ChatGPT GPT-4, the “brain” behind ChatGPT, before. In this article, we take a look at Google's presented AI, explain what's new and how Google Gemini will change the chatbot in the long term. We also compare the capabilities of Google Gemini with those of the latest OpenAI GPT-4 version.

Artikel mit KI zusammenfassen lassen:
ChatGPT
Perplexity

What is behind Google Gemini?

Google Gemini comprises a family of multimodal large language models that should be able to understand and generate texts, images, videos and programming code themselves. There are two terms in this definition that should be better explained so that you can understand Google Gemini better.

As Large Language Models (short: LLM) in the field of artificial intelligence primarily refers to neural networks that are able to understand, process and generate human language themselves in various ways. The term “large” describes the property that these models are trained on vast amounts of data and have several billion neurons or parameters that recognize the underlying structures in the text.

Multimodal models are part of machine learning and include architectures that can process several variants of data, the so-called modalities. Until now, most models could only process a single type of data, such as text or images. Multimodal models, on the other hand, are able to record and process various formats.

Just like GPT-4 Google Gemini is also multimodal, meaning it can process various types of input, such as texts, images or programming code, and also provide them as output. In contrast to GPT-4, however, Gemini is built multimodally from the ground up and does not use different models for the different inputs. It remains to be seen which architecture will ultimately prevail.

The new thing about Google Gemini is not only the ability to process texts, audios, videos, images and even programming code, but also to use them to make your own conclusions. From now on, conclusions in fields such as mathematics or physics should no longer be a problem. In Google's examples, for example, errors are found in a math calculation and the corrected solution is also created and explained.

What can Google Gemini do?

Google Gemini was unveiled for the first time at a virtual press conference on December 06, 2023. At the same time, both the Google blog and the website of the AI company Google DeepMind, items online, which describe the functionalities of the new AI family.

According to these reports and the additional YouTube videos published, the following applications, for example, should be possible:

Google Gemini should be able to create programming code simply from an image of the finished application. This allows websites to be recreated, for example, by simply using a screenshot of the current page. Although this was already the case with GPT-4 and Google Bard possible, but the skills have been improved again. Nevertheless, no big leaps should be expected here, as a large part of the complexity of a website or a computer program cannot be shown via a screenshot. However, it can be a good starting point for further programming.

Examples are also shown in which two images are combined to form a new image and a corresponding text is written. In the example from Google, the AI is asked what the user can do with two balls of yarn. As an additional input, an image of the two differently colored balls is shown. The model provides a finished image of a woolen octopus that can be made from the two balls.

By far the most impressive application is not only interesting for all pupils, students and parents, as you might expect at first glance. The video shows how Gemini is used to correct homework in physics. It not only determines which tasks were solved correctly and which were solved incorrectly, but it can also explain which mistakes were made and how they can be corrected. Such reasoning is actually a remarkable achievement for a language model.

Just a few days after the initial presentation, some users discovered the important information hidden in the video descriptions of the YouTube videos. Google had been tricking with their presentation videos, for example by working with still images and text inputs when the model was supposed to recognize that the video was showing a batch of scissor rock paper. This approach was met with some criticism, as the presentation in her blog suggested significantly more capabilities, which the model was then unable to demonstrate.

A new feature was introduced in early September 2024: Gemini Live. With Gemini Live, Android users can experience real-time conversations with Google's AI. This allows them to have conversations without typing, with Gemini responding verbally. Google recently announced that Gemini Live is now available in over 45 languages. This allows users who speak different languages to communicate seamlessly via a single device.

And even more news: With Google AI Pro (formerly Gemini Advanced), users can now configure the AI to remember conversations, such as hobbies mentioned or specific life circumstances. However, this feature should be used with caution, as personal data is collected and potentially disclosed to third parties.

In addition, Gemini Deep Research is now running on the new 2.5 Pro model. According to Google, this upgrade improves Gemini's ability to analyze relevant information and generate comprehensive, multi-page reports in minutes.

Since the end of April 2025, Google AI Pro (formerly Gemini Advanced) users worldwide have also been able to generate high-quality 8-second videos using the new Veo 2 feature – simply by entering text. The AI creates dynamic videos based on a short description, without requiring any video editing skills. The clips can be downloaded as MP4 files, shared publicly, or sent directly (on mobile devices). For users of the higher-end Google AI Ultra subscription, the updated Veo 3 model has recently become available. According to Google, this model offers even better video quality, more realistic sounds, and expanded creative possibilities.

Which versions of Gemini are there?

Gemini 1.0

At the start, Google Gemini will be available in three different variants, which have been optimized for different devices.

‍Gemini 1.0 Ultra is the largest and most powerful model that is also used for the majority of applications. Since it is very computationally intensive, it will only be available for powerful devices, i.e. not on mobile devices such as cell phones or tablets. It is currently still undergoing internal security tests to prevent AI hacking. This variant is comparable in performance to GPT-4 and beats the OpenAI competitor's performance in the areas of reasoning, programming and mathematics in most tests. However, OpenAI's successor GPT-4 Turbo is already in the starting blocks, so it will be interesting to see how this model performs compared to Gemini Ultra.

Gemini 1.0 Pro is the all-rounder in the AI family and should be able to be used for a wide range of applications. However, Google leaves a few questions unanswered as to what exactly will be possible with it. It is currently already being used in the Google Chatbot Bard. However, it is to be replaced by Gemini Ultra in 2024. In terms of performance, this variant is comparable to GPT-3.5, which is currently used for ChatGPT.

Die Gemini Nano Finally, version has been optimized for applications that can be calculated on the device. This allows Gemini to be used on Android devices and apps can be developed that benefit directly from Google Gemini. The advantage is that no connection to Google servers is required for the calculation, so that sensitive data, such as messages, can also be worked with. Google is actually presenting an innovation in this area, as it is completely self-sufficient without a connection to a server or the Internet and is also powerful enough to run on mobile devices, which are usually less powerful than computers or notebooks.

Gemini 1.5

Just a short time after Google released the three versions of Gemini 1.0 Ultra, Pro, and Nano, the company announced the updated, more powerful version, Gemini 1.5, in early 2024.

Gemini 1.5 Pro is expected to deliver comparable results to Gemini 1.0 Ultra, but requires less processing power and boasts impressive capabilities in understanding particularly long contexts and creating various types of audio (music, speech, and video soundtracks). Gemini 1.5 Pro is expected to be able to process:

  • one hour of video
  • 11 hours of audio
  • 30,000 lines of code
  • and 700,000 words

Compared to Gemini 1.5 Pro, Gemini 1.5 Flash is a lighter version, optimized for speed and efficiency, and more cost-effective to deploy. This version has also been used for the free use of the Gemini AI chatbot since the end of July 2024.

Since the end of August 2024, there has been a new addition to the Gemini 1.5 family. Logan Kilpatrick, product lead of Google AI Studio, announced on X (formerly Twitter) on August 27, 2024, that the company has released three new variants of Gemini: a smaller model, Gemini 1.5 Flash-8B, a "more powerful" model, Gemini 1.5 Pro, and the "significantly improved" Gemini 1.5 Flash – however, these versions are currently only experimental.

Logan Kilpatrick (@OfficialLoganK) August 27, 2024

However, since 2025, Gemini 1 and 1.5 are considered legacy and are no longer actively used in Gemini products.

Gemini 2.0

Gemini 2.0 was introduced in December 2024 and not only brings exciting innovations but also demonstrates how versatile modern AI can be:

A particular focus is on proactive support: With so-called autonomous agents, Gemini 2.0 plans ahead and acts independently – always under human supervision, of course. For example, when planning a trip, Gemini could independently suggest suitable flights, hotels, or activities that perfectly match the user's profile.

There are four different versions of Gemini 2.0:

  • Gemini 2.0 Flash
  • Gemini 2.0 Flash Lite
  • Gemini 2.0 Flash Thinking (experimentell)
  • Gemini 2.0 Pro (experimentell)

The Flash version of Gemini 2.0 has been generally available since January 2025. What's special about the new version is that it's twice as fast as its predecessor and supports multimodal output such as images and audio in addition to text. At the same time, Google has integrated Gemini 2.0 Flash into products like Google Search to enable even more precise answers to complex questions. Gemini 2.0 Flash Lite has similar features to the regular Flash version and, according to Google, is the most cost-effective model to date.

Furthermore, Gemini 2.0 is being tested in innovative prototypes, including Project Astra, a versatile assistant with enhanced dialog capabilities, and Project Mariner, a smart browser extension. Gemini 2.0 is also demonstrating the versatile uses of AI in the world of gaming and robotics – from supporting gamers to applications involving spatial reasoning.

And starting today, Gemini users can test the experimental models 2.0 Flash Thinking and 2.0 Pro.

  • 2.0 Flash Thinking is a reasoning model optimized for speed and shows the model's thought process to provide more precise answers. It also supports apps like YouTube and Google Maps for complex, multi-step questions.
  • 2.0 Pro is aimed at Google AI Pro (formerly Gemini Advanced) subscribers and helps with complex tasks such as programming and math.

This model series has now been replaced by Gemini 2.5.

Gemini 2.5 Flash

Gemini 2.5 Flash has been generally available (no longer experimental) since mid-2025. The model excels particularly at tasks that require precise logical reasoning and deeper context processing. It builds on the strengths of previous Flash versions, but offers additional improvements in reasoning, data analysis, and document summarization. The model now processes even larger amounts of information with an expanded context window of up to two million tokens – ideal for complex use cases such as legal analysis or scientific evaluations. Currently, users of the Gemini app (web or mobile) can try out the Gemini 2.5 Flash model – even without a paid subscription.

Gemini 2.5 Pro

As part of the ongoing development of Gemini 2.5, the Pro version received a major update for programming tasks. 2.5 Pro has been fully available since June 2025 and is no longer classified as experimental. The model now understands coding queries even more intuitively and delivers stronger results, enabling users to create compelling web applications with Canvas, for example, more quickly. This upgrade builds on the consistently positive feedback we received for the original 2.5 Pro version, particularly regarding programming capabilities and multimodal reasoning.

Gemini 2.5 Pro is available via both the web app at gemini.google.com and the mobile app for Android and iOS. Previously, access to this model was limited to subscribers of the paid Google AI Pro (formerly Gemini Advanced) plan. However, it is now offered in both the paid and free versions for testing.

Despite all the progress, these models remain error-prone, and users should be cautious when handling the output.

How can Google Gemini be used?

Access to Google Gemini 2.5 Flash and 2.5 Pro is now available to all users via the Gemini app on desktop and mobile devices. Gemini 2.5 Flash is available to developers and enterprises via the Gemini API in Google AI Studio and Vertex AI. Additionally, Google AI Pro (formerly Gemini Advanced) subscribers can use 2.5 Pro without a usage limit. Non-paying users also have access to 2.5 Pro, albeit with limited usage limits. Once a non-subscriber reaches their limit, they automatically fallback to the next lower model, usually Gemini 2.5 Flash.

Gemini 2.5 Flash is now also used in the free version of Google's own chatbot, Google Gemini (formerly Bard). This chatbot is part of the Google search engine and can also be used there. The older models, Gemini 1.5 Flash and 1.5 Pro, have now been replaced by 2.5 models.

Google Gemini can also be used on Google's new Android smartphones. Google is replacing the pre-installed Google Assistant with Gemini as its new default AI assistant. Gemini Nano is used in various versions. These can interact using multimodal models, text, images, or voice. Something suitable for iOS users was also launched in November 2024: the Gemini app, which now makes it even easier for all Apple lovers to use.

Gemini is also integrated with other Google apps, such as Google Calendar and Gmail, to further enhance the user experience. Google explains this feature as follows:

'Have Gemini find the lasagna recipe from your Gmail account and ask the AI assistant to add the ingredients to your shopping list in Keep.'

The Gemini assistant is now also integrated within Google Maps. Users can simply ask for inspiring activities or places within the app itself, and thanks to Gemini, they receive personalized recommendations with summarized reviews and specific details about the destination – all in real time and without having to search for them themselves.

And it goes even further: Gemini also replaces the Google Assistant on Google TV, and a new feature allows Gemini to control smart home devices from the lock screen, allowing users to conveniently control things like lights, heating, or cameras without unlocking their phone.

Google Bard, OpenAI GPT-4 or GPT-4o?

When OpenAI with the application in November 2022 ChatGPT and the associated GPT3 model was launched, the hype was huge and the expected answer from Google had been a while in coming.

It took until March 2023 when the chatbot Bard, developed by Google, was released. However, this person initially draws attention more due to incorrect or funny answers. It seems that this race has now become much closer, as Google Bard has received a real boost from Gemini.

Especially on X, formerly Twitter, several tweets were created that show the sometimes funny and sometimes frightening errors that were very common in the previous version of Google Bard:

Bard on the monopoly lawsuit against Google:

‍
Google Bard's problems with simple math problems:


Google Bard's handling of typos:


In an article by Business Insider, ten questions are posed to both ChatGPT, using the underlying GPT4 model, and Google Gemini (formerly Bard) using Gemini Pro. The article notes that Google Gemini responds very cautiously to questions about sexuality and politics. This is likely to avoid unpleasant missteps like in the past. Furthermore, the answers from Google Gemini (formerly Bard) sometimes appear somewhat more reserved and rational, while ChatGPT also uses emojis and emotional responses.

Technically, Gemini is said to currently perform better than GPT-4 in image, video, and audio benchmarks, but GPT-4 is said to be stronger in the area of logical reasoning.

The choice of the 'better' model seems difficult overall and likely depends largely on the specific use case: GPT-4 impresses with high accuracy and detailed answers, while GPT-4o excels with speed and efficiency. For enhanced contextual understanding and fast response times, Gemini 2.0 appears to be a compelling solution.

Besides these leading players, however, other competing chatbot systems and large language models should not be overlooked. Some of these models also offer compelling advantages, for example, by providing more up-to-date information. Therefore, we have prepared a detailed article for you that presents interesting alternatives to ChatGPT and Google Bard (now Google Gemini).

Conclusion

Google Gemini is an interesting innovation from Google, particularly impressive due to its ability to handle a wide variety of formats. The first applications demonstrated are particularly interesting due to their ability to draw targeted conclusions and justify their answers.

Try moinAI now and experience the future of customer communication in a secure, efficient, and user-friendly way. In just four simple steps, you can create a chatbot prototype and get a first impression of the technology – completely free of charge and without obligation.

Looking for an AI Alternative for Customer Communication?
Discover your chatbot alternative with moinAI and see what AI can do on your website.
Success. Thank you!
Oops! Something went wrong. Please try again.
Artikel mit KI zusammenfassen lassen:
ChatGPT
Perplexity

Happier customers through faster answers.

See for yourself and create your own chatbot. Free of charge and without obligation.