It is a well -known fact that different model families can use different tokenizers. However, it has a limited legs of legs on how the process of “Tokenization“ His varies between thesis tokenizers. Do all tokenizers result in the same number or tokens for a specific entry text? If not, how differential are the tokens generated? How significant are the differences?
In this article, we explore these questions and examine the practical implications of the variability of tokenization. We present a comparative history of two families of border models: Chatgpt of Openai vs Claude de Anthrope. Allle that their announced figures of “cost-by-token” are highly competitive, experiments reveal that anthropic models can be 20-30% more exempt than GPT models.
API PRECING-CLAUDE 3.5 SONETO VS GPT-4O
As of June 2024, the price structure for the thesis Two advanced border models is highly competitive. Both Sonnet Claude 3.5 from Anthrope and Openai GPT-4o have identical costs for production tokens, while Claude 3.5 Sonnet offers a 40% lower cost for entry tokens.
Source: Vantage
The “hidden tokenizer inefficiency”
Despite the lowest token rates of the anthropic model, we observe that the total costs of the execution experiments (in a given set of fixed indications) with GPT-4O is much cheaper compared to Claude Sonnet-3.5.
Because?
The anthropic token tend to decompose the same entry into more tokens compared to OpenAi’s token. This means that, for identical indications, anthropic models produce considerable more tokens than their OpenAI counterparts. As a result, although the cost by trial for the entry of the sonnet of Claude 3.5 may be lower, the increase in tokenization can compensate for these savings, which leads to greater general costs in practical cases.
This hidden voice of the way in which the Anthrope token codifies information, for more use to represent the same content. Tokens count inflation has a significant impact on costs and the use of the context window.
INEFICIENCY OF DOMAIN DOMAIN TOKENIZATION
The different types of domain content are differently token by the anthropic tokenizer, which leads to variable or higher tokens counts compared to Openai models. The AI research community has noticed similar tokenization differences here. We tried our findings in three popular domains, namely: articles in English, code (Python) and mathematics.
Domain | Model entrance | Tokens GPT | Tokens Claude | % General expenses of Token |
Articles in English | 77 | 89 | ~ 16% | |
Code (python) | 60 | 78 | ~ 30% | |
Math | 114 | 138 | ~ 21% |
% Token token or sonnet Claude 3.5 (in relation to GPT-4O) Source: Lavanya Gupta
When comparing the Sonnet Claude 3.5 with GPT-4O, the degree of inefficiency of tokenizer varies significantly among content domains. For articles in English, Claude’s token produces approximately 16% more tokens than GPT-4O for the same entry text. This overload increases sharply with more structured or technical content: for mathematical equations, overload is 21%, and for the python code, Claude generates 30% more tokens.
This variation arises because some types of content, such as technical documents and code, or contain patterns and symbols that the anthropic token fragments in narrower pieces, which leads to a higher tokens count. On the contrary, more natural language content tends to exhibit a lower tokens overload.
Other practical implications or tokenizer inefficiency
Beyond the direct involvement of costs, there is also an indirect impact on the use of the context window. While anthropic models demand a larger context window or 200k tokens, unlike the 128K OpenI tokens, due to verbosity, the useful token space can be narrower can be narrower for anthropic models. Therefore, there could be a small or large difference in the sizes of the context windows “announced” compared to the sizes of the “effective” context windows.
Implementation or tokenizers
The GPT models use the peer coding of bytes (BPE), which are frequently covered by the characters of characters that coocrize to form tokens. Specifically, the latest GPT models use the O200K_Base tokenizer. Here you can see the real tokens used by GPT-4O (in the Tiktoken Tokenizer).
JSON
{
#reasoning
"o1-xxx": "o200k_base",
"o3-xxx": "o200k_base",
# chat
"chatgpt-4o-": "o200k_base",
"gpt-4o-xxx": "o200k_base", # e.g., gpt-4o-2024-05-13
"gpt-4-xxx": "cl100k_base", # e.g., gpt-4-0314, etc., plus gpt-4-32k
"gpt-3.5-turbo-xxx": "cl100k_base", # e.g, gpt-3.5-turbo-0301, -0401, etc.
}
Unfortunately, you can’t say much anthropic tokenors, since your tokenizer is not as direct and easy as GPT. Anthrope released his tokens counting API in December 2024. However, he was soon disappeared in the versions later than 2025.
Late Red reports that “Anthrope uses a unique tokenizer with only 65,000 variations of Token, compared to the variations of Openi 100,261 for GPT-4”. This Colab notebook contains python code to analyze tokenization differences between GPT and Claude models. Another tool that allows the interface with some common tokenizers publicly available our findings.
The ability to proactively estimate tokens counts (without invoking the API of the real model) and budgetary costs is crucial for AI companies.
Key control
- Anthrope’s competitive price comes with hidden costs:
While Sonnet Claude 3.5 sacrifices of Anthrope are sacrificed 40% of entry token costs compared to Openai’s GPT-4O, this advantage of apparent cost can be very accelerating due to differences in how the entry text is touched. - Hidden “tokenizer inefficiency”:
Anthropic models are inherently more Verbose. For companies that process large text volumes, understanding this discrepancy is crucial when evaluating the true cost of implementing models. - INEFICIENCY OF DOMAIN-DEPPEN TOKENIZER:
When choosing between the OpenAi and anthropic models, Evaluate the nature of your entry text. For natural language tasks, the cost difference can be minimal, but technical or structured domains can lead to higher significant costs with anthropic models. - Effective context window:
Due to the verbosity of the anthropic tokenizer, its largest -announced 200K context window can sacrifice a usable space less effective than the 128K of OpenI, which leads to a potential Gap between the announced and real context window.
Anthrope did not respond to Venturebeat’s requests at comments at the time of publication. We will update the story if they respond.