A study on OpenAI’s ChatGPT-5 model determined that it produces incorrect answers in approximately 25% of cases, according to an article from Tom’s Guide. While this highlights a persistent error rate, the model demonstrates significant improvements in accuracy compared to its predecessor, GPT-4.
Specifically, ChatGPT-5 makes about 45% fewer factual errors and generates six times fewer hallucinated or entirely made-up answers than GPT-4. Despite this progress, the study reports that the model still suffers from overconfidence and can confidently present incorrect information, a characteristic often referred to as hallucination.
The model’s performance and accuracy vary depending on the specific task. For example, it scored 94.6% on the 2025 AIME mathematics test and had a 74.9% success rate on a set of real-world coding tasks. On the more challenging MMLU Pro benchmark, an academic test covering science, math, and history, ChatGPT-5 achieved an accuracy of about 87%. However, it still makes mistakes in general knowledge and complex reasoning questions.
The study attributes these errors to several underlying factors. These include the model’s limitations in fully understanding nuanced questions, using training data that may be outdated or incomplete, and its fundamental design based on probabilistic pattern-prediction. This mechanism can occasionally generate responses that seem plausible but are factually inaccurate.
The article advises users to verify any critical information sourced from ChatGPT-5. Given that the model is not infallible, this caution is particularly important for inquiries related to professional, academic, or health matters, even with the model’s documented improvements in reliability.








