A chapter closes with the release of GPT-4o mini by OpenAI. Some AIs have become obsolete even though they have only recently been released. This new model will replace the widely used and free ChatGPT 3.5. GPT-4o mini, which is more efficient and power-saving than the old model, is said to be much smarter than its predecessor. In the transition from the model we’re used to GPT-4o mini, users can expect new features and GPT-4o mini is completely free.
To fully understand the importance of this change, we need to look at the journey of ChatGPT 3.5. Launched in 2022, ChatGPT 3.5 quickly became a household name, captivating users with its ability to generate human-like text, answer questions and assist with various tasks. It became widely used by both individuals and businesses. It even served as a powerful tool for content creation, problem-solving, and general information retrieval. In recent years, however, other AIs released for free have overtaken ChatGPT 3.5.
What can the GPT-4o mini do?
The GPT-4o mini comes with a number of enhancements that naturally set it apart from its predecessor, otherwise, it would be illogical. One of the most notable improvements is the ability to perform both text and image processing. This new dual capability allows the model to process and understand information from multiple sources, potentially resulting in more comprehensive and accurate output.
The model’s capabilities were tested with various benchmarks. On the MMLU (Massive Multitask Language Understanding) benchmark, which measures reasoning across a wide range of topics, the GPT-4o mini achieved a score of 82%. According to these results, OpenAI‘s new model outperforms other small AI models on the market. We will make benchmarks in our own family. And of course, it’s up to you.
Another area where the GPT-4o mini is said to shine is mathematical reasoning. The model scored an impressive 87% on the MGSM (Mathematics Grade School Merge) benchmark. This means that our new friend can write beautiful code, solve logic problems, or help you with complex homework.
In terms of practical applications, GPT-4o mini supports both text and image in its API. This means that you can integrate the application into other media rather than just using it on a website. OpenAI is also not stopping there, announcing plans to extend the model’s capabilities to include video and audio processing in the future, further expanding its potential use cases.
The GPT-4o mini specs
From a technical point of view, the GPT-4o mini tries to represent a balance between performance and efficiency. OpenAI has not disclosed the exact size of the model but says it is on par with other small AI models such as Llama 3 8b, Claude Haiku, and Gemini 1.5 Flash. I mean, what else would it be anyway?
One of the most important advantages of the GPT-4o mini is its speed. As you know, in the ChatGPT 3.5 model, answers sometimes take up to 20 seconds, now they will take up to 10 seconds. According to initial tests, the model has a median throughput rate of 202 coins per second. This is more than twice as fast as the GPT-4o and GPT-3.5 Turbo, making it particularly suitable for applications where fast response times are crucial.
The model’s context window is another important technical feature. The GPT-4o mini can process up to 128,000 coins at a time. This is equivalent to the length of an average book. This large context window allows the model to maintain consistency and relevance in long interactions or when dealing with long documents.
The GPT-4o mini contains information and data up to October 2023. This data can be refreshed with updates. But for now, you may not get efficient answers for an event or development after that date.
Is it too long?
- Supports text and image processing
- MMLU benchmark score: 82%
- MGSM benchmark score: 87%
- API supports text and vision input
- Size comparable to other small AI models (Llama 3 8b, Claude Haiku, Gemini 1.5 Flash)
- Median output speed: 202 tokens per second
- Context window: 128,000 tokens
- Knowledge cutoff: October 2023
- Response time up to 10 seconds
How about the GPT-4o mini price?
API prices have not yet been announced, but OpenAI has priced the model at 15 cents per million input tokens and 60 cents per million output tokens for developers using its APIs. This pricing structure makes GPT-4o mini more affordable than its predecessors and OpenAI claims it is 60% cheaper than GPT-3.5 Turbo.
The reduced cost of running GPT-4o mini could have far-reaching implications for AI accessibility. By making advanced AI capabilities available at a lower price point, OpenAI is potentially opening the door for wider adoption of AI technology across a variety of industries and regions.
The model’s efficiency and affordability make it particularly attractive for high-volume, simple tasks that require repeated AI model calls. This could be particularly beneficial for small and medium-sized enterprises or developers working on projects with limited budgets.
GPT-4o mini vs other leading models
To better understand the new model, let’s make a technical comparison with other models:
Model | Accuracy (%) | MMLU | GPQA | DROP | MGSM | MATH | HumanEval | MMMU | MathVista |
GPT-4o mini | 82.0 | 40.2 | 79.7 | 87.0 | 70.2 | 87.2 | 59.4 | 56.7 | 63.8 |
Gemini Flash | 77.9 | 38.6 | 78.4 | 75.5 | 40.9 | 71.5 | 56.1 | 58.4 | 0.0 |
Claude Haiku | 73.8 | 35.7 | 78.4 | 71.7 | 40.9 | 75.9 | 50.2 | 46.4 | 0.0 |
GPT-3.5 Turbo | 69.8 | 30.8 | 70.2 | 56.3 | 43.1 | 68.0 | 0.0 | 0.0 | 0.0 |
GPT-4o | 88.7 | 53.6 | 83.4 | 90.5 | 76.6 | 90.2 | 69.1 | 0.0 | 0.0 |
GPT-4 | 90.0 | 55.0 | 85.0 | 92.0 | 78.0 | 92.5 | 70.5 | 60.0 | 65.0 |
GPT-4 Turbo | 91.0 | 56.0 | 86.0 | 93.0 | 79.0 | 93.5 | 71.0 | 61.0 | 66.0 |
Gemini | 85.0 | 50.0 | 80.0 | 88.0 | 72.0 | 88.5 | 65.0 | 55.0 | 60.0 |
Gemini Advanced | 87.0 | 52.0 | 82.0 | 90.0 | 74.0 | 90.0 | 67.0 | 57.0 | 62.0 |
- MMLU (Massive multitask language understanding): A comprehensive benchmark for evaluating language models on a wide range of tasks across different domains.
- GPQA (General purpose question answering): A benchmark to test a model’s ability to answer general knowledge questions accurately.
- DROP (Discrete reasoning over paragraphs): A reading comprehension benchmark that requires models to perform discrete operations like addition and subtraction over text.
- MGSM (Multi-step math): A benchmark for assessing a model’s capability to solve multi-step mathematical problems.
- MATH: A benchmark specifically focused on evaluating the mathematical problem-solving abilities of language models.
- HumanEval: A benchmark for assessing code generation, where models are evaluated based on their ability to generate correct and functional code from problem statements.
- MMMU (Multi-Modal Machine Understanding): A benchmark that tests a model’s ability to understand and integrate information from multiple modalities, such as text, images, and audio.
- MathVista: A specific benchmark designed to evaluate the mathematical reasoning and problem-solving abilities of language models in various mathematical disciplines.
Ultimately, the “best” model depends on your requirements. The GPT-4o mini offers an impressive mix of capabilities in a more compact package, making it an attractive choice for many users.
Featured image credit: OpenAI