OpenAI says free ChatGPT health errors fell 71 percent

OpenAI has launched GPT-5.5 Instant as the default model for free ChatGPT users, claiming it now matches the performance of its frontier Thinking models on health queries based on internal evaluations. Health information has come under heightened scrutiny, particularly after a Guardian investigation cited inaccuracies in Google AI Overviews, prompting Google to withdraw those features for certain health-related questions.

OpenAI stated that its updates represent an improvement in health information accuracy. This shift allows a large audience to access medical responses from ChatGPT without redirecting to external sources, which could impact publishers and SEOs in the health sector.

The company highlighted advancements on its HealthBench and HealthBench Professional benchmarks, noting that GPT-5.5 Instant outperforms its predecessor, GPT-5.3 Instant. OpenAI also reported a 71% decrease in health responses flagged for potential factuality issues over two months, referencing their live traffic monitoring systems.

A separate assessment involved comparing responses generated by GPT-5.5 Instant against those written by physicians across 3,500 representative health conversations. In evaluations, a panel of doctors rated responses from the AI model higher than those authored by human physicians in terms of accuracy, communication, and completeness.

OpenAI asserted that GPT-5.5 Instant demonstrates fewer failure modes than older versions and human responses, pointing out fewer missed red flags and a reduced likelihood of failing to seek additional context from users. HealthBench was developed using input from the company’s physician network and employs rubrics created by doctors for assessment.

OpenAI works with over 260 physicians spanning 60 countries, who have collectively reviewed more than 700,000 example responses. Although this figure has been consistently cited since the launch of ChatGPT Health in January, no independent review results have been released.

According to OpenAI, health and wellness inquiries represent a significant portion of ChatGPT interactions, with over 230 million users posing health-related questions weekly. Furthermore, health discussions are classified under strict policies prohibiting advertisements during conversations about health, mental health, or politics.

Market demand for health information via the free tier of ChatGPT may increase zero-click pressure on publishers, as AI-generated responses see heightened engagement, reportedly the highest among categories analyzed in Google’s AI Overviews. OpenAI’s claims regarding the accuracy of health responses currently lack third-party validation, raising concerns about the reliability of its evaluations.

The announcement did not clarify how these updates might affect citation protocols, suggesting that the onus for verifying answers and addressing traffic losses could shift to healthcare practitioners.

Featured image credit