Table Of Content
When you ask an AI to count the number of times the letter ‘r’ appears in “strawberry,” you might expect a simple answer. Yet, there’s a surprisingly common error where AI systems like GPT-4o will only report two ‘r’s instead of the correct number, three. But why does this happen, and how has it been addressed in newer models like GPT-o1?
The Tokenization Confusion
One reason behind this mistake is the way language models process words. Instead of seeing each letter in the word “strawberry,” the AI breaks it down into parts, called tokens. The word “strawberry” might be seen as two tokens, such as “straw” and “berry” rather than individual letters. Since the AI isn’t designed to count letters but rather to predict the next token based on context, it doesn’t do well at breaking down words letter by letter. As a result, it misses one of the ‘r’s because it’s focusing on word-like chunks, not individual characters. This is why GPT-4o typically reports two ‘r’s in “strawberry”.
We verified this directly via our Raiday’s ChatGPT Plus subscription:
How the Newer Model GPT-o1 Fixes This
GPT-o1 is built to handle these kinds of details better. While earlier versions might have been good at understanding the overall meaning of words and sentences, they struggled with tasks requiring precision, like counting. GPT-o1, however, has integrated better methods for breaking down words into their individual letters, allowing it to accurately count all three ‘r’s in “strawberry.” Essentially, this newer model pays more attention to details like individual characters when necessary, which is why it doesn’t fall into the same trap.
Why is this important?
This kind of problem illustrates well how AI can sometimes produce very human-like errors. The previous model didn’t make this mistake because it was badly designed but rather because it focused on higher-end tasks, such as understanding meaning compared to mere character counting. Yet, even as AI continues to evolve with regard to previous, more seemingly simple tasks, new versions such as GPT-o1 undergo fine-tuning.
How to Use AI for Letter Counting
If you ever need to count letters like ‘r’ in words like “strawberry,” newer AI systems can do this well. Just ask it directly, and it will break the word down properly. For older models, though, it’s often better to ask them to write a small piece of code to count the letters. AI excels at generating programming code, so this workaround can give you an accurate answer.
So, let’s break it down this letter counting challenge together.
Starting with the first letter ‘s’, then ‘t’, and then ‘r’—there’s your first ‘r’. Moving on to ‘a’, ‘w’, ‘b’, ‘e’, and then another ‘r’, which makes it two. Right after that, there’s yet another ‘r’, bringing the total to three. The word wraps up with a ‘y’.
So, you’ve got three instances of the letter ‘r’ in ‘strawberry’. It’s a simple exercise, but it’s interesting how easy it can be to overlook repeated letters when you’re not paying close attention.
Now, regarding language models, we concluded that earlier versions like ChatGPT-4 sometimes miscounted and reported only two ‘r’s. This could have been due to limitations in processing patterns or oversight in character recognition. The newer ChatGPT-o1 has improved in this aspect, providing accurate counts and reasoning by better analyzing each character individually.