AIMultiple ResearchAIMultiple ResearchAIMultiple Research
We follow ethical norms & our process for objectivity.
This research is not funded by any sponsors.
GenAI
Updated on Apr 9, 2025

OpenAI GPT-n models: Advantages & Shortcomings in 2025

Generative Pre-trained Transformers (GPT) are a type of Large Language Model (LLM), also called foundation model. The technology was popularized series of deep learning based language models built by the OpenAI team.

These models are known for producing human-like text in numerous situations. However, they have limitations, such as a lack of logical understanding and hallucinations, which limits their commercial functionality.

To inform managers about this valuable technology, consider GPT-4o‘s functioning mechanism, importance, use cases, and challenges.

What is GPT-4o?

As of February 2025, the most recent version of OpenAI’s Generative Pre-trained Transformer (GPT) series is GPT-4o. Released in May 2024, GPT-4o is a multimodal model capable of processing and generating text, images, and audio. It offers faster performance and improved capabilities across various modalities compared to its predecessors.

Looking ahead, OpenAI has announced plans to release GPT-4.5, internally referred to as “Orion,” which will be the final model without a chain-of-thought reasoning process. Following this, the company aims to integrate various technologies into a GPT-5 model, streamlining their AI offerings. 

Figure 1. GenAI Marketshare

gpt marketshare

For further details about GPT-4, you can read our article: GPT4: In-depth Guide.

The latest updates

The latest updates to GPT-4 technology in 2025 have focused on improving multimodal capabilities, efficiency, and user accessibility. GPT-4o, the newest iteration, enhances the original GPT-4 by allowing for real-time integration of text, images, and audio. This makes tasks like translating menus from photos or engaging in voice conversations more seamless. Additionally, the model is faster and more cost-efficient than GPT-4, making it accessible to a broader range of users, including those on the free version of ChatGPT with some usage limitations.

Key updates include:

  1. Improved Multimodal Interaction: Users can upload images, engage in voice conversations, and even expect future video interaction features. The model is particularly adept at reasoning tasks and processing non-text inputs like images.
  2. Memory and File Handling: The model has been updated to better handle uploaded files, allowing it to retain key information for future interactions, making conversations more contextually relevant​.

The latest GPT model: GPT-4

As of November 2024, OpenAI’s most recent model is o1, introduced in September 2024. The o1 model is designed to enhance reasoning capabilities by allocating more time to process complex tasks, resulting in higher accuracy, particularly in science, coding, and mathematics. This approach allows o1 to tackle problems that were challenging for previous models, making it a significant advancement in AI development. 1

The training of GPT-4

The training process of the model involved two primary stages: pre-training and fine-tuning.

Pre-Training: In the pre-training phase, GPT-4 was exposed to a vast corpus of text data sourced from the internet, including public domain books, research articles, and web pages. This extensive dataset enabled the model to learn patterns, grammar, and context, allowing it to predict subsequent words in a sentence effectively. The pre-training process equipped GPT-4 with a broad understanding of human language and various subjects. 

Fine-Tuning: Following pre-training, the model underwent fine-tuning to enhance its performance and align its outputs with human expectations. This stage involved supervised learning, where human trainers provided example inputs and desired outputs, and reinforcement learning from human feedback (RLHF). In RLHF, human evaluators ranked multiple responses generated by the model, and these rankings were used to adjust the model’s behavior, promoting more accurate and contextually appropriate outputs. 

Training Infrastructure: Training a model of the model’s scale required substantial computational resources. Reports indicate that the process utilized approximately 25,000 Nvidia A100 GPUs over 90 to 100 days, processing around 13 trillion tokens. This extensive infrastructure facilitated the handling of the model’s complexity and the vast amount of data involved. 

Model Parameters: While OpenAI has not publicly disclosed the exact number of parameters in GPT-4, estimates suggest it contains around 1 trillion parameters. This significant increase in parameters compared to previous models contributes to GPT-4’s enhanced capabilities in understanding and generating human-like text. 

The Distinctive Features

Visual input option

Although it cannot generate images as outputs, the model can understand and analyze image inputs. 

Higher word limit

This model can process more than 25,000 words of text, which was below 3,000 words in earlier models.

Advanced reasoning capability

The model is outstanding compared to earlier versions in terms of its natural language understanding (NLU) capabilities and problem-solving abilities. The difference may not be observable with a superficial trial, but the test and benchmark results show that it is superior to others in terms of more complex tasks.

Advanced creativity

As a result of its higher language capabilities, GPT-4 is advanced in creativity compared to earlier models.

Adjustment for inappropriate requests

The model was criticized for its handicap in providing answers to inappropriate requests, such as explaining how to make bombs at home. OpenAI was working on this problem and made some adjustments to prevent the language models from producing such content. According to OpenAI, GPT-4 is 82% less likely to respond to requests for disallowed and sensitive content.

Adjustment for inappropriate requests, gpt

Increase in fact-based responses

Another limitation of the earlier models was that their responses were not factually correct for a substantive number of cases. OpenAI announces that GPT-4 is 40% more likely to produce factual responses than GPT-3.5.

gpt

Steerability

“Steerability” is a concept in AI that refers to its ability to modify its behavior as required. The model incorporates steerability more seamlessly than GPT-3.5, allowing users to modify the default ChatGPT personality (including its verbosity, tone, and style) to better align with their specific requirements.

For more detail on GPT-4, you can check our in-depth GPT-4 article.

FAQ

How does GPT-4o work?

It is an advanced multimodal language model, and it works by processing and generating responses using a combination of text, images, and audio. Here’s an overview of how it operates:
1. Multimodal Processing
Thic model can take in multiple types of input (text, images, and audio) and generate coherent responses that integrate all the provided data. For example, if you upload an image, GPT-4o can analyze it, provide descriptions, and answer questions related to the visual content. Its multimodal capabilities are integrated into a single neural network, allowing for faster and more seamless interactions across different data types​.
2. Self-Attention Mechanism
Similar to its predecessors, it is built on the Transformer architecture, which uses a self-attention mechanism. This enables the model to weigh the relevance of different pieces of input data (whether they are words, image elements, or audio signals) and produce the most contextually appropriate output. This is particularly useful for tasks requiring understanding of the relationships between various elements in a complex query​.
3. Memory Integration
It has an updated memory system that allows it to remember key parts of conversations or documents across interactions. This improves context retention, making future responses more personalized and contextually relevant​.
4. Real-Time Performance
One of the major improvements in GPT-4o is its ability to operate in real-time. Whether it’s handling voice conversations or analyzing images, the model provides faster responses with lower computational costs. Its design also includes efficient multimodal integration, allowing for smoother transitions between tasks such as switching from text-based to image-based queries.
5. Enhanced Reasoning Capabilities
GPT-4o excels in tasks that require reasoning, such as inductive and deductive logic. This makes it capable of handling more complex queries where multi-step thinking is involved. The model also performs well in interpreting and generating responses based on ambiguous or incomplete input data​.
6. Language Processing Improvements
The model includes better support for non-English languages, particularly those with non-Latin scripts, due to a more advanced tokenizer. This allows GPT-4o to handle multilingual tasks with greater accuracy and fluency compared to previous models​.

For more on how AI is changing the world, you can

Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments