The largest language models (LLMs)

About this guide

Particularly in the last two years, immense technological advances have been observed in the field of artificial intelligence and natural language processing. These developments are enabling new applications in science, business, and everyday life. This hype was triggered by the development of so-called large language models (LLMs), which are now not only capable of processing and generating text, but can also handle audio, images, and other files. The release of ChatGPT was met with great public interest and emphasizes the importance of multimodality, context length, and more efficient training methods. We explain what this is all about and which LLMs currently dominate the market.

Robert Weber

Last updated:

8.10.2025

moinAI-Features, die im Artikel vorkommen:

AI Chatbot

GenAI

Technology

Artikel mit KI zusammenfassen lassen:

This article introduces some of the best-known and most powerful LLMs and explains the differences between them. While GPT-3 and BERT were still the focus in 2020–2022, today the landscape is dominated by the latest updates to models such as GPT-4o, Gemini 1.5, Claude 3.5, Llama 3 and 4, and Mistral 9B. At the same time, the importance of open-source models and multimodal systems is growing. The following is an overview of the current status in 2025, as well as trends and areas of application for the most important LLMs today.

What is an LLM and how does it work?

A large language model is a subcategory of machine learning models that are trained to understand, process, and even generate human language. In most cases, these architectures have billions of learnable parameters, which make the model “large,” i.e., comprehensive, and enable it to learn complex structures in the data. In addition, huge amounts of text are used to train the model and understand language with all its peculiarities, such as grammar and synonyms.

The language is read in the form of “tokens,” i.e., the smallest units into which a text is broken down before the model processes it. The context length (e.g., “128k tokens” in ChatGPT) indicates how many such units a model can keep “in memory” at the same time. API usage costs are often billed per token.

Token in Large Language Models In the context of AI and large language models (LLMs), a token refers to a basic unit of text that the model processes. Tokens can be words, parts of words, or even individual characters, depending on how the model segments the text. LLMs such as GPT or LLaMA count the number of tokens to determine context and limit inputs and outputs. Source: OpenAI (2025)

Nowadays, it is standard for models to be “multimodal.” This means that they are capable of processing audio, video, and other file formats in addition to simple text. Therefore, the term “large language models” is supplemented by terms such as “foundation model” or “transformer model,” as they do not exclusively process language and are based on a broad knowledge base.

Key LLM Innovations in 2025

LLMs are evolving into increasingly powerful, versatile systems and form the central building block of modern AI. Here is an overview of the most important developments in large language models in 2025:

Feature	Description
Mixture of Experts (MoE)	Models only use parts of the network per request and select an “expert” to answer the request → more efficient use
Multimodality	Models that support not only text, but also audio/image/video
Retrieval Augmented Generation (RAG)	Models only use parts of the network per request and select an “expert” to answer the request → more efficient use
High context lengths	Continuous increase in processable context lengths with new versions
Fine tuning	Simple fine-tuning methods such as LoRA or PEFT enable specialized models

What are the most important LLMs?

Since OpenAI released ChatGPT in November 2022, there have been many developments in the field of large language models, and other well-known tech companies have released their own models. In this section, we will look at the most important LLMs and their characteristics.

OpenAI GPT

^{(current model: GPT-5 August 2025)}

The latest model of Generative Pretrained Transformer (GPT for short), released by OpenAI in August 2025, is GPT-5. It is a “unified system” that automatically decides whether to respond quickly or apply deep thinking, known as “reasoning,” depending on the task. Plus and Pro users get access to variants with extended reasoning capacity. ChatGPT allows limited use of the latest model in the free version after registration. GPT-5 builds on the multimodality of GPT-4o with an advanced architecture.

GPT-4o, with the “o” at the end standing for “omni,” is capable of combining audio, image, and text processing capabilities while performing significantly better and more efficiently than previous versions. The interesting thing about this architecture is that it is not a single, large model, but a multitude of “smaller” models that work together in a targeted manner. This approach is known as "Mixture of Experts" (MoE). Although OpenAI keeps the exact architecture under wraps, it is assumed that there are a total of 16 so-called expert models that have been trained for different sub-areas. For each prediction, two of these models are then activated and provide the output. (Hackernoon, 2023)

GPT-4o is the latest generation of Generative Pretrained Transformers (GPT for short) published by OpenAI, which is behind ChatGPT. The “o” at the end stands for “omni”, as the latest version is able to combine audio, image and word processing capabilities and is even more powerful and efficient than previous versions.

In July 2024, a smaller version of GPT-4o called GPT-4o mini was also introduced, which has a smaller architecture with fewer parameters. In general, this offers advantages for many use cases that do not require the highest output quality, as significantly less computing capacity is required, which reduces costs, and the models can also be used on weaker devices, such as smartphones or tablets. These models are particularly suitable for real-time applications where response time is more important and compromises can be made in terms of performance. Despite this smaller architecture, GPT-4o mini still manages to outperform larger models in individual benchmarks. For example, it performs better in programming and mathematics benchmarks than the Llama-3 model with eight billion parameters or Mistral Large.

We have compiled more detailed information on the various GPT models from OpenAI in our comprehensive article on GPT-4 here.

Join the Chatbot Community

Get updates on AI and chatbots delivered straight to your inbox.

2. Mistral/Mixtral

^{(Last update: September 2025)}

Mistral AI is a French startup specializing in the development of powerful large language models. It was founded by former Google and Meta employees, among others, and has well-known investors such as Microsoft. One difference from many other providers is that some of Mistral's models are open source, meaning they can be used and customized free of charge. Mistral's goal is to make the development of artificial intelligence more transparent and comprehensible. Particularly noteworthy: All data remains in Europe and is subject to the EU AI Act, meaning that conversations with Le Chat, for example, are not transferred to US servers, as is the case with other models. This offers both increased data security and legal reliability for companies and users.

These freely accessible models include:

Mistral 7B: This model has around seven billion parameters (7B) and is the smallest model in the Mistral family. Although it has fewer parameters than comparable LLMs, it can still compete with larger models. It impresses with fast predictions and low computational requirements but is limited in its applications and is particularly suitable for English language processing or programming tasks.
Mixtral 8x7B: This model is based on the mixture-of-experts approach, in which eight individual models work together. It is efficient in its use of resources and versatile in its applications. Mixtral 8x7B is fluent in several languages, including English, French, Spanish, Italian, and German. In some benchmarks, it even outperforms GPT-3.5 in certain tasks.‍
Mixtral 8x22B: This is the most advanced open-source variant of Mistral. It consists of eight expert models, each with 22 billion parameters. This size allows it to handle significantly more complex tasks, such as summarizing long texts or generating large amounts of text. The model can process up to 64,000 tokens at a time, which is equivalent to approximately 48,000 words.

Designations such as 7B or 22B indicate the number of model parameters in billions (e.g., “7B” = 7 billion parameters). These classic Mistral models are purely text-based. However, Mistral has also introduced multimodality: models such as Pixtral 12B process images in addition to text, thereby significantly expanding their range of applications.

In addition to open-source models, Mistral also offers commercial models:

Mistral Large: One of Mistral's most powerful models, ranking just behind GPT-4 in benchmarks. It can be used for text generation in various languages and for programming tasks.
Mistral Small: Optimized for fast, resource-efficient predictions, such as in customer support for classifications or short text responses. For more complex tasks such as data extraction or extensive text summaries, the use of larger models is recommended.
Mistral Embed: This model generates numerical word embeddings from English text that can be used for machine processing and semantic analysis.

Mistral AI also offers Le Chat, an AI chatbot that, similar to ChatGPT, can be used for entertainment, text generation, and interactive applications.

New in 2025 are the integration of Super-RAGs for improved access to external knowledge sources and multimodal variants such as Pixtral, which can process text and images simultaneously.

3. Llama Model Family

^{(Last update: September 2025)}

In February 2023, Facebook's parent company, Meta, also entered the world of large language models and introduced its LLM Meta AI, or Llama for short. The release was a response to Meta's advances in natural language processing (NLP), which began in 2019 with the Laser tool, which could convert sentences and their content in different languages into a vector space.

Since the introduction of the large language model, the focus has been on presenting the best possible foundation model that can be adapted for various natural language applications. To promote research in this area, Meta decided to make the programming code for the model family publicly available. Meta plans to continue improving the Llama models until the end of 2025, and future versions are expected to offer expanded language support and optimized performance metrics.

Since its initial release in 2023, several model families have been introduced. The current model version is the Llama-4 family (April 2025), which is based on a mixture-of-experts architecture. The models are multimodal (support for text and image data) and multilingual (support for 12 languages). The variants include:

‍Scout: The model with 17 billion active parameters and 16 experts has a context window of 10 million tokens.
Maverick: The model has 17 billion active parameters and 128 experts, as well as a context window of 1 million tokens.
Behemoth (in development): The largest model with 288 billion active parameters and a total of approximately 2 trillion parameters, which outperforms models such as GPT-4.5 and Claude Sonnet 3.7 in certain benchmarks.

The Llama 4 models are integrated into Meta's AI Assistant and accessible via platforms such as WhatsApp, Messenger, Instagram, and the web. Previous versions of the Llama family are as follows:

Llama 3: After just under another year, in April 2024, Meta released the third and most recent version of Llama in variants with eight and 70 billion parameters. Compared to Llama 2, several improvements were made, including a new tokenizer that converts natural language into tokens much more efficiently and has a larger vocabulary of 128,000 tokens. According to Meta, the 70-billion-parameter model outperforms other models such as GPT-3.5 and Mistral Medium.
Llama 2: The Llama 2 variant (July 2023) contained three different models with seven, 13, and 70 billion parameters, which were trained with a significantly larger dataset of two trillion tokens. As a result, Llama 2 with 70 billion parameters also performed significantly better in many benchmarks compared to
Llama (1), the original variant of the model with 65 billion parameters. It was offered in different sizes, designed so that even smaller infrastructures with less computing power could train the model.

4. Google Models

^{(Last update: September 2025)}

In 2025, Google is one of the leading providers of multimodal large language models with its Gemini series. Google addresses both complex scientific and technical tasks as well as interactive applications in education and research, as the models have large context windows (up to 2 million tokens), multimodality, and an efficient mixture-of-experts architecture. Google DeepMind developed the Gemini series as the successor to the LaMDA and PaLM models.

The current version, Gemini 2.5 Flash-Lite (June 2025), is the cost-effective and faster version of the Gemini 2.5 series, which has advanced features for handling complex tasks, primarily based on Deep Think. Gemini 2.5 was released in May 2025 and further improved computing power and multimodality. For users with limited resources, Google simultaneously introduced the Flash Lite version, which delivers fast and cost-effective responses. In addition, Gemini Robotics-ER 1.5 was released in March 2025, which was developed specifically for robot control and enables seamless interaction via voice and visual signals. The first version, Gemini 1.5, already featured a large context window of up to two million tokens and a mixture-of-experts architecture that allows efficient use of computing resources.

In our article on Google Gemini 3, we took another detailed look at all the models. However, Google's research department had already delivered its first large language models in 2018, which were based on the Transformer approach from 2017 and delivered remarkable progress.

Google already set important milestones in the field of natural language processing in 2018 with BERT and T5. BERT enabled a better understanding of contexts between words through bidirectional processing and opened up new fields of application such as question-answering systems and sentiment analysis. T5 introduced the text-to-text principle, in which different tasks such as translation or summarization can be controlled using the same input text.

‍

5. Other Large Language Models

The models and updates mentioned are just a small selection from the constantly growing LLM market. This market is very dynamic, with new providers offering powerful models appearing regularly. Here is a look at a few selected models:

Grok AI, the language model from X, formerly Twitter, is not only making a name for itself due to its performance but also due to the fact that a significant portion of the training data apparently comes from X content. Earlier versions of the model struggled with hallucinations, such as publishing a false story about basketball player Klay Thompson. In current versions, accuracy and factual reliability have been significantly improved. In terms of performance, Grok AI lags behind other current LLMs in various benchmarks and also has a smaller feature set than its competitors.

Claude 3.5 by Anthropic, a company founded in 2021 by several former developers from OpenAI, the company behind ChatGPT, is the latest model of the Claude series and remains a major competitor. The model performs on par with, or in some cases better than, GPT-4 and GPT-4o in benchmarks. At the same time, Claude 4 is already being tested in beta versions and features larger context windows, multimodality, and improved agent capabilities. The model variants, such as Haiku, Sonnet, and Opus, offer different sizes and performance classes.

Several new startups and research institutes have also released notable models, including Mistral Pixtral, Mixtral 8x22B Cohere Command R, and regional developments such as AI4Bharat and SEA-LION, which are specifically tailored to language diversity or special tasks.

Key trends for 2025 include end-to-end multimodality (text, image, audio, video), ultra-long context windows, the efficient use of specialized hardware such as Groq LPU NVIDIA H100/H200, and the growing importance of open-source models.

What developments are there outside Europe and the USA?

In the public perception, much of the AI development in the field of large language models is concentrated in Europe and the US, as large, established companies and extensive data sets are available there. As a result, Western models often have only limited responsiveness to languages and cultures in other regions. Training data from Llama-2, for example, contains only about 0.5% content from Southeast Asian countries, even though over 1,200 dialects and languages are spoken in this region (Carnegie, 2025).

The SEA-LION model was therefore introduced in 2024 as the first large language model specifically trained for the ASEAN region. Although it is only a fraction of the size of GPT-4, for example, it can be more helpful in specific applications, such as customer support, as it can respond more specifically to the cultural differences of individual countries.

China and other Asian markets are also developing their own models. DeepSeek, a Chinese AI startup based in Hangzhou, has taken a significant step forward in AI development with its DeepSeek-Coder-V2 model. This open-source model also uses the mixture-of-experts (MoE) approach and outperforms GPT-4 Turbo in specialized tasks such as programming and mathematical logic. It supports 338 programming languages and offers an extended context length of 128,000 tokens, enabling deeper and more coherent processing of complex code bases. Baidu, a leading Chinese technology company, has also further expanded its ERNIE models, with the current version outperforming GPT-3.5 in general tasks and GPT-4 in Chinese language tasks, according to the manufacturer. Benchmark analyses such as OpenCompass confirm that these models achieve high performance, especially in regional contexts.

2025 is also seeing a trend toward regionally specialized LLMs worldwide, a development that makes large language models more accessible and usable for regions that have been underrepresented in the global AI market to date.

Concluding Remarks

Despite the impressive capabilities of large LLMs such as ChatGPT, practical experience shows that they are not always ideal for direct customer communication. Discover how moinAI can help your company with the smart automation of customer inquiries and why it is the ideal partner for professional customer communication.

Large language models such as ChatGPT are not made for customer communication

Discover moinAI, a smart automation solution that has been specially developed for your customer communication.

That was successful. Thank you.

Oops. Something went wrong. Please try again.

Artikel mit KI zusammenfassen lassen:

Happier customers through faster answers.

See for yourself and create your own chatbot. Free of charge and without obligation.

Create a chatbot

Request a Product Demo

The largest language models (LLMs)

What is an LLM and how does it work?

Key LLM Innovations in 2025

What are the most important LLMs?

OpenAI GPT

2. Mistral/Mixtral

3. Llama Model Family

4. Google Models

5. Other Large Language Models

What developments are there outside Europe and the USA?

Concluding Remarks

Read more articles from our Chatbot Lexikon:

What Is a Large Language Model?

Google Gemini Explained: Overview of Gemini AI Models

Generative AI: The Art of the Creative Algorithm

Training Data and AI: How a Chatbot Is Trained

What is an AI Chatbot? Definition, Benefits, Function

AI-powered product recommendations in e-commerce for more personalization

Happier customers through faster answers.