induwara.lkinduwara.lk
Opinionai-coststokensfinops

The AI token bill is here. What it means for SL builders

Uber blew its 2026 AI budget by April and one firm racked up a $500M Claude bill. Here's what the token-cost reckoning means for small teams in Sri Lanka.

Induwara Ashinsana4 min read
Illustration of businessmen working inside an hourglass surrounded by dollar symbols
Image: TechCrunch

The AI token costs that big companies waved off in 2025 are now landing as real invoices, and the numbers are ugly. TechCrunch's The token bill comes due lays out the scramble: Uber exhausted its entire 2026 AI coding budget by April, and one unnamed company let a $500 million Claude bill pile up after forgetting to set usage limits.

I read it as a warning that arrived early for those of us building on small budgets. If a company with Uber's finance team can lose the thread, a two-person team in Colombo running an AI side project absolutely can.


💰 Why the numbers exploded even as prices fell

Here's the part people miss. Per-token prices have been dropping for two years, so everyone assumed the bill would shrink too. It did the opposite, because usage grew faster than price fell.

The article cites figures worth sitting with:

Metric Figure
Per-developer token use (9 months) 18.6x increase
Top engineers' productivity 2x higher
Top engineers' token use 10x higher
Projected usage growth by 2030 24x (Goldman Sachs)
One engineer's bill, single month $40,000

Key takeaway: Cheaper tokens did not make AI cheaper. Cheaper tokens made it easy to use 18x more of them. Falling unit price is a trap if your consumption is uncapped.

That 2x-productivity-for-10x-cost line is the whole problem in one row. The output is real, but the cost curve and the value curve are not the same shape.


🔍 "Tokenmaxxing" is over. Guardrails are in.

The piece frames a clean before-and-after. In the "tokenmaxxing" era, leadership demanded the best models and the fastest rollout, and nobody counted tokens. Now the mood has flipped.

J.R. Storment of the FinOps Foundation put it plainly:

"The whole conversation shifted from tokenmaxxing and 'go fast' to 'we need guardrails, how do we control this?'"

Some signals from the article of how fast the correction is moving:

  • Microsoft revoked Claude Code licenses months after handing them out.
  • Priceline saw a contract renewal land at 4–5x the old cost. Its IT finance lead compared adoption to a "crack-cocaine epidemic."
  • The Linux Foundation is launching a Tokenomics Foundation in July 2026 to standardise token metrics and billing, the way FinOps did for cloud.

The pattern is familiar to anyone who lived through the first cloud-bill shock. Adopt fast, get the surprise invoice, then bolt on controls afterward.

What makes tokens harder than cloud is sheer scale. The article describes tracking token cost as a "trillions-of-rows-a-month data problem", against the "hundreds-of-millions" of traditional cloud billing. Alexander Embiricos, who runs enterprise at OpenAI, says the questions buyers ask have changed: it's now about visibility and "what token controls do you have?" That shift, from chasing capability to demanding receipts, is the real story here.


🛠️ The tooling rush, and what's actually free

A whole vendor layer is forming to meter and cap AI spend. The article names cost trackers like Pay-i, Paid, and spend management from Ramp, observability from Datadog and New Relic, and model routers like Factory and OpenRouter that pick the cheapest model that can do the job.

Most of that is enterprise pricing. For a small SL team, the useful idea is not the vendor list, it's the technique. You can copy the cheap parts by hand.

Technique What it does Cost to you
Model routing Send easy tasks to small/cheap models, hard ones to frontier models Free (your own if statement)
Hard spend caps Set a hard limit in the provider dashboard Free
Prompt caching Reuse cached context instead of re-sending it Built into the API
Token logging Log tokens per request, review weekly A few lines of code

Bottom line: The expensive observability platforms exist because most teams never logged token usage in the first place. Log it from day one and you've already done the 80% that matters.


🌐 What this means for you

If you're building from Sri Lanka, on a learning budget or a thin runway, the lesson is not "avoid AI." It's that an uncapped AI bill behaves like an uncapped cloud bill. It will find the leak you didn't know you had.

Concrete moves I'd make before the next API call:

  1. Set a hard billing cap today. Every major provider lets you. The $500M and $40k stories all start with "we forgot to set limits."
  2. Default to the smallest model that works. Route the easy 80% of calls to a cheap or free-tier model, and only reach for the frontier model when the task genuinely needs it.
  3. Log tokens per feature, not just per month. A monthly total tells you that you overspent. Per-feature logging tells you where, which is the only thing you can act on.
  4. Estimate before you ship. If your product calls a paid model on every user action, do the multiplication first. Before you wire up text-to-speech or any per-request paid feature, our free AI TTS cost calculator will sanity-check the per-character math so the bill doesn't surprise you later.

The companies in this story had the money to absorb a six-figure mistake and keep going. We don't, and that's actually an advantage. It forces the discipline that the giants are only now retrofitting under pressure: know your token cost per feature before it ships, not after the invoice clears.

Key takeaway: The cheapest token is the one you never needed to send. Cap your spend, route to small models by default, and measure per-feature, you'll be running the playbook the rest of the industry is scrambling to adopt right now.

#ai-costs#tokens#finops
IA

Induwara Ashinsana

Information Systems student at UCSC and Executive Director at Ryzera Technologies. Writes about software, AI, and what it means for builders in Sri Lanka.

About the author →

Keep reading