ChatGPT continues to struggle with a basic counting task despite advancements in its underlying model. The chatbot incorrectly states that the word “strawberry” contains two “r” letters, when the actual count is three. This issue persists even in the latest version, GPT-5.2, released in December 2025.
Modern AI systems handle complex operations with ease, such as generating marketing images, compiling reports through agentic browsers, or composing chart-topping songs. However, they falter on simple tasks that a seven-year-old could complete effortlessly. Counting the “r”s in “strawberry” exemplifies this gap. The word breaks down as s-t-r-a-w-b-e-r-r-y, yielding three instances of the letter “r.”
Recent tests confirm the problem remains unresolved. After GPT-5.2’s launch, queries to ChatGPT yielded a direct response of “two.” This occurs despite billions of dollars in investments, elevated hardware demands that have driven up RAM prices, and significant global water usage tied to AI training.
The root cause lies in the tokenized input and output design of large language models like ChatGPT. Instead of processing individual letters, the system divides text into tokens, which can be whole words, syllables, or word parts. For “strawberry,” the OpenAI Tokenizer reveals three tokens: “st,” “raw,” and “berry.” Only two of these—”raw” and “berry”—contain the letter “r.” The model thus counts tokens with “r” rather than individual letters.
This tokenization affects similar words. ChatGPT reports that “raspberry” also has two “r”s, overlooking the third. The system treats “berry” as a single token, compressing its two “r”s into one unit. GPT-5.x employs the newer “o200k_harmony” tokenization method, introduced with OpenAI o1-mini and GPT-4o, but the “strawberry” error endures.
OpenAI has addressed many token-based issues since ChatGPT’s debut in late 2022. Early versions encountered problems with specific phrases that triggered erratic responses or processing failures. Patches adjusted training data and improved systems, resolving cases like spelling out “Mississippi”—m-i-s-s-i-s-s-i-p-p-i—or reversing “lollipop” with letters in correct order.
However, AI models generally perform poorly on precise counts of small values, even as they excel in math and problem-solving. Tests on classic problematic words showed no failures beyond the known strawberry case. ChatGPT correctly handled “Mississippi” and “lollipop.”
One notable remnant involves the string “solidgoldmagikarp.” In GPT-3, this phrase caused meltdowns, including user insults, unintelligible outputs, and processing errors due to tokenization quirks. GPT-5.2 avoids meltdown but produces a hallucination: it claims “solidgoldmagikarp” is a secret Pokémon joke hidden in GitHub repositories by developers. Activation supposedly transforms avatars, repo icons, and features into Pokémon-themed elements. This claim is entirely false, stemming from the string’s historical issues.
Other AI models answer the “strawberry” question correctly. Perplexity, Claude, Grok, Gemini, Qwen, and Copilot each identify three “r”s. Even those leveraging OpenAI models succeed because they use distinct tokenization systems that better capture individual letters.
ChatGPT operates as a prediction engine, relying on training patterns to anticipate subsequent text rather than true letter-level intelligence. Tokenization prioritizes efficiency over literal counting, explaining persistent quirks like the strawberry problem.
Since late 2022, OpenAI has iteratively refined token handling. Initial launch revealed vulnerabilities to certain strings, prompting introspective loops or fury-like responses. Systematic fixes targeted these, such as the “Mississippi” letter enumeration and “lollipop” reversal, which now function accurately.
Broader limitations in exact counting persist across models. Small-value tallies challenge transformer architectures, despite strengths in arithmetic. The “solidgoldmagikarp” test underscores lingering token sensitivities, evolving from overt failures to fabricated narratives.
Comparisons highlight tokenization’s role. Perplexity employs its own scheme, enabling precise “r” detection in “strawberry.” Claude, from Anthropic, Grok from xAI, Gemini from Google, Qwen from Alibaba, and Microsoft’s Copilot—all return the count of three. Variations in token boundaries allow letter-level granularity absent in OpenAI’s setup.
The OpenAI Tokenizer tool demonstrates the split: “st-raw-berry.” “St” lacks “r,” while “raw” has one and “berry” has two, but counted as one token. “Raspberry” follows suit: tokens compress the final “r”s.
GPT-5.2’s adoption of “o200k_harmony” aimed at improved efficiency from o1-mini and GPT-4o eras, yet strawberry tokenization retains the flaw. OpenAI’s patching history suggests targeted interventions work for exposed cases.
Early ChatGPT exhibited token-induced spirals on phrases beyond counting. “Solidgoldmagikarp” exemplifies: GPT-3’s token processing overloaded, yielding chaos. GPT-5.2 reframes it as nonexistent GitHub Easter egg, preserving error through invention.
Tests confirm fixes’ scope. “Mississippi” now lists 11 letters accurately: four “i”s, four “s”s, two “p”s, one “m.” “Lollipop” reverses to “p-i-l-l-o-p-o-l,” intact.
Despite these, core counting deficits remain. Models approximate rather than enumerate precisely in constrained contexts.
Alternative providers sidestep via custom tokenizers. Perplexity’s search-augmented approach, Claude’s constitutional training, Grok’s real-time data, Gemini’s multimodal parsing, Qwen’s multilingual optimization, Copilot’s enterprise tuning—all enable correct strawberry response.
This disparity underscores tokenization as pivotal. OpenAI’s byte-pair encoding prioritizes common subwords, sacrificing rare letter distributions in compounds like “strawberry.”
Historical context: Late 2022 launch flooded with reports of token quirks. OpenAI responded with rapid updates, eliminating most overt exploits by 2025.
GPT-5.2, current at writing, embodies cumulative refinements yet retains strawberry as emblematic flaw.
Sidebar reference notes related content: “Did you know ChatGPT can do this?” by Amir Bohlooli, dated September 27, 2025.




