Claude Opus 4.7 Tokenizer Tax: API Bill Up 35%

My API bill jumped in April. Not by a little. By a lot.

We run Anthropic’s API across several WordPress plugin builds at Wbcom Designs. When I pulled our April statement I saw a 31% spike on the Opus line compared to March, on roughly the same workload. I dug in expecting to find a bug in our prompt logic. What I found instead was a tokenizer change.

Claude Opus 4.7 shipped with a retrained tokenizer. Finout’s cost intelligence team flagged it first, and NXCode’s benchmarks backed it up: the same input strings that Opus 4.5 tokenized efficiently now produce 20 to 35% more tokens under the Opus 4.7 tokenizer. You are paying for those extra tokens whether you notice them or not.

This post documents what we found, how much it costs in real numbers for a WordPress agency, and the three mitigation layers that brought our April spend back below our March baseline.

What Changed in the Opus 4.7 Tokenizer

Anthropic redesigned the Opus 4.7 tokenizer to handle multilingual text, code, and structured data more accurately. The tradeoff: English prose, PHP code, and JSON blobs that previously compressed tightly into tokens now get split more granularly.

Here is a concrete example. I took a typical WP plugin audit prompt we use internally at Wbcom Designs, a 1,200-word system prompt describing plugin architecture review rules, and ran it through both models using the Anthropic tokenizer API.

Input	Opus 4.5 tokens	Opus 4.7 tokens	Delta
Plugin audit system prompt (1,200 words)	1,847	2,503	+35.5%
WP REST API response (JSON, 800 lines)	3,210	4,018	+25.2%
PHP function docblock review (600 words)	912	1,129	+23.8%
Client requirements doc (2,000 words)	2,981	3,584	+20.2%

The heaviest hit is structured system prompts with lots of code blocks. Those go up 35%. Plain prose expands less. But the direction is always more tokens, never fewer.

The reason for the expansion: the Opus 4.7 tokenizer uses a larger vocabulary optimized for code precision and multilingual coverage. Longer vocabulary means individual tokens carry more specific meaning, but common English words and short code tokens that used to compress into one token now sometimes split into two. For end users writing plain conversational messages, the difference is minimal. For agencies running structured technical prompts with PHP, JSON, and WordPress API responses, the difference is significant.

Understanding the direction of the change also helps you predict which future updates might shift costs again. When Anthropic releases a new model, check the release notes for any mention of tokenizer updates. If the notes say anything about improved code handling, multilingual support, or vocabulary expansion, expect token counts to shift and re-run your top prompts through the token counting endpoint before migrating any production workflow. The surprise is preventable with 30 minutes of pre-migration testing.

Real Bill Impact for a WordPress Agency

Opus 4.7 is priced at $15 per million input tokens and $75 per million output tokens. At our usage pattern, roughly 60% input, 40% output, here is how the tokenizer expansion translates to real dollars.

Say you ran 5 million input tokens a month on Opus 4.5. With the same prompts on Opus 4.7, you are now running roughly 6.25 to 6.75 million input tokens. That is an $18 to $26 monthly increase on inputs alone, per 5M baseline. For agencies running 50M+ input tokens monthly on deep code review workflows, that becomes an $180 to $260 monthly line item that appeared with no change in workload. If you have been working on cutting your Claude token usage for WordPress, the tokenizer change may have partially erased those savings.

Agency size	Monthly input tokens	Opus 4.5 cost	Opus 4.7 cost (uncached)	Token tax per month
Solo dev	5M	$75	$96-$101	+$21-$26
Small agency (5 devs)	25M	$375	$480-$506	+$105-$131
Mid agency (15+ devs)	100M	$1,500	$1,920-$2,025	+$420-$525

These numbers assume no caching, no batching, no model routing. They represent the worst case, and unfortunately, the default case if you migrated straight from Opus 4.5 to 4.7 without adjusting your infrastructure.

Auditing Your Own Token Expansion

Before you start optimizing, you need to know which of your prompts expanded the most. Anthropic’s API has a token counting endpoint that lets you measure this without making a full inference call. Run a count against your 5 most-used prompts. Any prompt with delta over 20% is a caching candidate. Any prompt with delta over 30% that runs frequently is a routing candidate for Sonnet 4.6.

The token count endpoint returns input_tokens for a given model without consuming any credits. This is the fastest way to build a before/after comparison across your prompt library. Budget 30 minutes to run this audit; the results usually surface 2-3 prompts that account for 70% of your token expansion.

When you run the audit, group prompts by type. System prompts that describe your coding standards, security review criteria, or output format rules tend to expand the most. User message templates that pass structured data: JSON payloads, plugin file lists, database schema dumps, are the next highest. Free-form conversational messages expand the least. Once you know the distribution, you can prioritize the optimizations that matter: caching heavy system prompts first, then batching structured-data workflows, and finally routing conversational tasks to Sonnet.

The 90% Offset: Prompt Caching

Here is the good news. Anthropic extended prompt caching to Opus 4.7, and the discount is aggressive: cached input tokens cost 90% less than uncached. If your system prompt is stable across requests, and for WP plugin builds it usually is, you can cache it and pay $1.50 per million cached input tokens instead of $15.

That changes the math significantly. Our plugin audit workflow uses a 2,500-token system prompt on every call. Without caching, at 10,000 calls per month, that is 25 million system prompt tokens at $15/M = $375. With prompt caching, the first call is full price and every subsequent cache hit costs $37.50 total. A $337 monthly saving on one workflow alone.

To enable prompt caching, add the cache_control parameter to your system prompt block in your API request, along with the anthropic-beta: prompt-caching-2024-07-31 header. The cache TTL is 5 minutes. For interactive workflows, that is fine. For batch jobs, you can chain calls within the TTL window to keep the cache warm and avoid re-creation charges.

One important nuance: the cache is per-prefix. If your system prompt changes between calls, even slightly, the cache misses. For WP plugin workflows where the system prompt is truly static across calls (same review criteria, same output format instructions), caching is a near-perfect fit. For workflows where the system prompt varies per client or per task type, you need to structure the variable portion as user content and keep the cacheable instructions strictly stable.

There is also a cache creation cost worth knowing about: creating a new cache entry costs 25% more than a normal uncached input. That fee is paid once on the first call that creates the cache, then you get 90% off all subsequent calls until the cache expires. The math still strongly favors caching for any system prompt used more than a few times per minute. If you run 100 calls per hour with the same 2,500-token system prompt, the cache creation overhead is negligible compared to the savings across the remaining 99 calls.

The 50% Offset: Batch API

Anthropic’s Batch API cuts all token costs by 50% in exchange for up to 24-hour turnaround. If you have any workflow that does not need a real-time response: nightly code reviews, bulk plugin documentation generation, weekly SEO content audits. Batch processing cuts the tokenizer tax in half before you even start optimizing prompts.

At the Opus 4.7 input rate of $15/M, batch drops you to $7.50/M. Add prompt caching on top and cached batch tokens cost $0.75/M. That is actually cheaper than Opus 4.5 was without any optimization. The tokenizer tax disappears entirely for offline workflows.

At Wbcom Designs we batch our nightly plugin compatibility sweeps, weekly documentation refresh passes, and monthly security audit reports. None of these need real-time responses. Running them through Batch API at off-peak hours costs a fraction of synchronous Opus calls and still delivers Opus-quality output.

Setting up Batch API in WordPress is straightforward with wp_remote_post. You submit a JSONL file of requests, receive a batch ID, and poll for completion. The batch result is available as a JSONL download within 24 hours. A WP cron job handles the polling and result processing. This pattern works for any workflow where you can tolerate overnight delivery on results.

One thing to keep in mind when scoping batch jobs: the 24-hour turnaround is a maximum, not a guarantee. In practice, most batch jobs I submit complete within 6 to 10 hours. For workflows that need results by the start of the next business day, scheduling the batch at 6pm the evening before gives comfortable margin. If a batch job is time-sensitive but still does not need real-time response, you can submit smaller batches more frequently to reduce turnaround variance.

When to Fall Back to Sonnet 4.6

Sonnet 4.6 is priced at $3 per million input tokens and $15 per million output tokens: 5x cheaper on inputs than Opus 4.7. The quality gap matters for some tasks and is negligible for others.

Tasks where I use Sonnet 4.6 instead of Opus at Wbcom Designs: first-pass code generation for standard WP hooks and filters, client-facing copy drafts for plugin documentation, bulk taxonomy cleanup and metadata generation, quick plugin compatibility checks, and support ticket triage.

Tasks where Opus 4.7 earns its rate: complex multi-file refactors where context coherence matters across 10k+ lines, architectural reviews where missing a subtle dependency creates downstream bugs, security audit passes on payment or membership plugin logic, and client discovery sessions with ambiguous requirements. I covered this model-selection problem in depth when looking at the best AI tools for WordPress plugin development. The routing principle is the same; the prices have shifted.

A practical routing signal: if you can describe the task in one clear sentence with no ambiguity, Sonnet usually handles it well. If the task requires the model to hold several competing constraints in mind simultaneously and make judgment calls between them, Opus earns its premium. WP REST API endpoint scaffolding falls in the first category. Refactoring a plugin’s authentication layer to support multiple auth methods while preserving backwards compatibility falls in the second.

The routing decision also matters differently depending on how your team is structured. If you have junior developers doing first-pass code generation and senior developers doing review and architecture, you might default Sonnet to the junior-equivalent tasks and reserve Opus for the senior-equivalent work. The quality distinction roughly maps to that skill gradient. Most first-pass drafts and standard implementations are Sonnet territory; the edge cases and cross-cutting concerns are where Opus shows its value.

The Combined Offset: Stacking All Three

Configuration	Effective input rate (per 1M)	Vs Opus 4.5 baseline
Opus 4.7, no optimizations	~$19.50 (with tokenizer expansion)	+30% more expensive
Opus 4.7 + prompt caching	$1.50 cached / $15 uncached	90% cheaper on system prompt
Opus 4.7 + batch	$7.50	50% cheaper across the board
Opus 4.7 + batch + caching	$0.75 cached portions	Cheaper than Opus 4.5 ever was
Sonnet 4.6 (tactical routing)	$3.00	80% cheaper than Opus 4.7

The right strategy for most agencies is a combination: route tactical tasks to Sonnet, cache stable system prompts on the Opus tasks that stay on Opus, and batch anything that does not need real-time delivery. Done properly, your April 2026 Anthropic bill should be at or below your February 2026 bill, even on Opus 4.7.

A Practical Triage Framework for Your Agency

The triage process works best when it is embedded in your request handling code rather than left as a judgment call for each developer on each task. When routing is a decision, it gets inconsistent. When routing is a rule encoded in software, it is consistent and free. The rule is straightforward: does this task need a real-time response? If not, batch it and save 50%. Does it reuse a stable system prompt? If yes, cache it and save 90% on the repeated portion. Does it genuinely require Opus-level reasoning? If not, route to Sonnet 4.6 and save 80%. Only if all three answers push toward Opus do you pay the full rate.

In practice, after applying this triage to our April workflows, we brought our effective Opus 4.7 spend back below our March Opus 4.5 baseline. The tokenizer tax is real, but it is payable with the right routing.

One pattern worth building early: a decision tree in your internal tooling that routes every AI request through this triage automatically. Even a simple if-else in your AI request dispatcher, checking request_type against a routing table, saves developer time from having to think about routing on each call. Once the routing logic is encoded, it runs without friction. Encoding this once means you get the cost savings on every subsequent call without thinking about it.

How to Monitor Your Token Spend Going Forward

The Anthropic dashboard gives you token usage breakdowns by model and time period. The thing it does not give you natively is per-workflow cost attribution: which of your automated tasks is consuming the most tokens. For agencies running multiple AI workflows, adding cost logging at the application level is worth the hour it takes to set up.

At Wbcom Designs we log input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens to a custom WP database table on every API call. The log includes a workflow_type field so we can run weekly queries to see which workflows are cost outliers. This is how we discovered that our client onboarding workflow, which uses a large static system prompt, was our highest-impact caching candidate.

The query to identify your top cost drivers takes about 5 minutes to write. Running it weekly for the first month after any model migration catches surprises before they accumulate into a large bill. After the first month, once patterns stabilize, a monthly review is usually sufficient unless you are adding new AI-powered features.

A second monitoring layer worth setting up: per-model budget alerts in your infrastructure. If Opus spend crosses a daily threshold, you want to know before the month closes. Both Anthropic’s console and most cloud cost monitoring tools support threshold alerts. Setting an alert at 110% of your baseline daily spend gives you early warning if a runaway process or misconfigured workflow is burning tokens unexpectedly. The alert pays for itself the first time it catches a loop that would have cost several hundred dollars to run to completion.

What to Watch Going Forward

Tokenizer changes are rare but not unprecedented. Anthropic made a significant tokenizer update between Claude 2 and Claude 3 as well. The pattern is consistent: larger vocabulary, better code and multilingual handling, slightly higher token counts on English technical content.

The things I am monitoring for the next model generation: whether Anthropic publishes a tokenizer changelog alongside model releases (they currently do not, which is how this caught agencies off guard), whether the prompt caching TTL changes to something longer, whether Batch API turnaround decreases below 24 hours, and whether Google’s Gemini 2.5 Pro pricing pressure forces Anthropic to adjust Opus tier rates.

For now, the immediate action is to audit your top 10 prompts and implement caching on anything with a stable system prompt. That single change recovers most of the April surprise for the majority of WordPress agencies running structured AI workflows.

For WordPress agencies running Anthropic at scale, the three-layer response is: batch async work, cache stable prompts, route tactical tasks to Sonnet 4.6. The model is better than 4.5 on multi-file reasoning and long-context coherence. But you should know you are paying for it in ways that do not show up in the published per-token price.

The deeper lesson from April 2026 is about visibility. The tokenizer change was not announced in a way that cost-management teams could easily catch. The only signal was the invoice. If you are running significant AI spend, building the monitoring layer is not optional. It is the foundation for being able to respond to future changes quickly rather than discovering them a month late on a bill. Token costs are an engineering cost center now. They deserve the same monitoring rigor as database query performance and server resource consumption.

Practical Steps to Take This Week

If you read this post and want to act on it before your next invoice cycle, here is the order of operations that has the biggest impact in the shortest time. First, pull your last 30 days of Anthropic usage data from the console and sort by model. If Opus is your highest cost line, continue. If Sonnet or Haiku is higher, the tokenizer issue is less acute for your usage pattern.

Second, identify your top three workflows by token volume. These are the candidates for optimization. You are looking for workflows with high call frequency and a repeating system prompt. Frequency times system prompt size equals your caching opportunity. A workflow that runs 500 times a day with a 3,000-token system prompt is saving you roughly 1.35M tokens per day at 90% cache discount versus uncached. At $15/M, that is about $20 per day you can recover by enabling prompt caching on that one workflow.

Third, check which of those workflows produce results that you do not act on for hours or overnight. A nightly compatibility sweep, a scheduled documentation refresh, a weekly security audit: these are your batch candidates. Moving them to batch immediately halves their cost without any prompt changes or quality tradeoff.

Fourth, run the token count endpoint on your system prompts to get exact before/after numbers on Opus 4.5 versus 4.7. The endpoint is free to use. It returns token counts without making an inference call. Use it to quantify your specific token expansion percentage. It will likely be different from the 35% headline number because your prompts are different from the benchmark prompts. The number you get from your own prompts on your own workload is the only number that matters for your budget.

Fifth, set up a simple cost log in your WordPress backend if you do not have one. Even a basic log of request type, token counts, and model per API call, stored in a WP options table or a custom table, gives you the data you need to run a cost-per-workflow breakdown at the end of each month. Without that log, you are flying blind on which workflows to optimize next. With it, every optimization decision is backed by data rather than guesswork.

The tokenizer change in Opus 4.7 is not going to be reversed. The expanded token counts are the new baseline. The question is whether you build the infrastructure to manage them or absorb the ongoing increase without visibility. For most WordPress agencies, the monitoring and caching setup takes less than a day to implement. The cost recovery starts on the first billing cycle after it is live.

Claude Opus 4.7’s Hidden Tokenizer Tax: Why Your April 2026 API Bill Is 35% Higher