Let me start with three real numbers. A company forgot to set spending limits when giving employees access to Claude. One month later: $500 million. An open-source project called OpenClaw — three people, 30 days, 603 billion tokens — ran up $1.3 million on OpenAI’s API. The founder joined OpenAI earlier this year, so this was treated as an internal stress test. He later noted that number was with “fast mode” pricing. Turning it off brought it down to about $300,000.
An Australian AI consultant set two safeguards on his Google Cloud account: a $7 budget alert and a $1,400 hard limit. An attacker found an old service he had forgotten to delete and sent 60,000 requests. Neither safeguard worked. Billing has latency. By the time the system caught up, the bill had hit $18,000. Uber burned through its entire annual Claude Code budget in April. Its CTO publicly admitted that AI costs are getting harder and harder to justify. What these numbers have in common isn‘t the dollar amount. It’s loss of control. They all point to the same fact: AI usage costs are exploding in ways most people haven‘t even started to notice.

How Much Have Tokens Actually Grown?
OpenRouter and a16z released a “State of AI” report based on 100 trillion real tokens. The answer is clear. Average input tokens per request grew from about 1.5K to over 6K — a 4x increase. Output tokens grew from about 150 to 400 — a 3x increase. Notice the gap. Input grew faster than output.
That’s not random. It reveals a fundamental shift: AI is moving from generation to comprehension. Models are no longer just “writing an essay.” They‘re reasoning through massive amounts of code, documentation, and knowledge bases before producing short but high-value insights. AI is turning from a pen into a brain. A 50-token user question might hide 5,000 tokens of codebase reading, 10,000 tokens of documentation, multiple tool calls, and dozens of reasoning steps. Your bill has very little to do with how much you typed.
Why Bills Are Spinning Out of Control
For two years, we got used to the mental accounting of flat monthly fees. Pay $20, use as much as you want. That model barely held up in the “chat and autocomplete” era, because the cost gap between light and heavy users was small. Then AI agents arrived. A single agent task can run for hours, call tools dozens of times, read entire codebases, plan and revise repeatedly. The cost gap between light and heavy users can now be several orders of magnitude. Flat monthly fees collapse under that kind of disparity.
GitHub explained its shift to usage-based pricing with an honest statement: GitHub has been subsidizing heavy users, but that model is no longer sustainable. Translation: Light users used to subsidize heavy users. The platform subsidized everyone. The subsidy is now over. The bill is going to the people who actually owe it.
On Reddit, one user posted a screenshot showing his monthly bill jumping from $28 to $746. He canceled. Another went from $50 to $3,000. But others pushed back. Those extreme bills mostly come from “vibe coders” — people who let AI iterate wildly without any cost awareness. One long-time user wrote: “I use it all day and barely go over my limit.” Both sides are saying the same thing. Token bills aren‘t exploding because AI got expensive. They’re exploding because usage patterns changed. The people treating AI as unlimited free labor are the first to feel the pain.

A Curious Pattern in the Data
The State of AI report mapped usage into four quadrants. Coding and role-playing landed in the same quadrant: high frequency, low cost per token. These two categories consume the most tokens by far. That‘s interesting. One is “work.” The other is “play.” One is about productivity. The other is about experience. But they share two things: both require massive context, and both require multi-turn interaction.
A programmer asks AI to read an entire codebase and then change it. A user asks AI to read an entire conversation history and then play a character. Both are asking AI to “inhabit” a specific situation and then work continuously within it. What does this mean? AI is being used as a substitute. Not a tool. Not an assistant. A substitute that can independently take on tasks. Programmers use AI to write code for them. Users use AI to socialize for them. This kind of usage doesn‘t consume tokens once. It consumes them continuously. The context builds up. The token burn accelerates. And unlike coding, where you can measure output, it’s very hard to quantify what a “role-playing task” is worth.
The People Who Figured Out the Token Math Are Already Winning
Facing token inflation, smart players are already moving. Glean does exactly one thing: unify knowledge scattered across a company so AI can access context directly without hunting for it. Less hunting means fewer tokens burned. This strategy tripled Glean‘s annual revenue in 15 months, pushing it past $300 million. Factory AI takes a different approach. It automatically sends simple tasks to cheap models and complex tasks to premium ones. Get routing right, and you save 10x. Both paths lead to the same place: make AI work without wasting tokens.
Glean’s CEO told CNBC something striking. “This is the first time I remember where technology costs are starting to match labor costs.” NVIDIA‘s VP of applied deep learning confirmed the observation. For his team, compute costs already far exceed employee salaries. Historically, technology was a small part of corporate costs. Now AI costs are rivaling payroll. Many companies burn through their annual AI budget in one or two months.
That raises a deeper question. When token costs start matching labor costs, how should companies think about the math? For the past year, AI usage was a celebrated metric. High usage meant you were progressive. Burning tokens meant embracing the future. Now many companies are asking a different question: what did those burned tokens actually buy?
Amazon had an internal leaderboard called KiroRank that tracked employee AI usage. Some employees started burning tokens on tasks that solved no real problem — just to climb the ranking. When this came out, Amazon‘s senior VP sent a company-wide message: “Don‘t use AI for the sake of using AI. Use it to solve customer problems.” That’s the core issue. Tokens themselves have no value. What they buy does.

How to Survive the Age of Token Inflation
An arXiv paper systematically broke down agentic coding costs. It found several counterintuitive things. Agentic tasks can consume 1,000x more tokens than ordinary code reasoning. Running the same task multiple times can produce 30x differences in token consumption. And higher token consumption does not reliably produce higher accuracy. Performance often peaks at moderate cost. Beyond that, you burn more and gain nothing. The paper reveals something important. Token efficiency and task quality are not a simple positive correlation. Burning more tokens doesn‘t necessarily get you better results.
So what does this mean? The real competitive advantage going forward won’t be “who can burn the most tokens.” It will be “who can get the most value out of each token.” Companies that are already incorporating token costs into their core KPIs are quietly building an edge. They don‘t chase “more AI.” They chase “better AI.” They don’t care about “price per million tokens.” They care about “cost per completed task.”
P.S. $500 million. $1.3 million. $18,000. What these numbers have in common isn‘t the amount. It’s the loss of control. They all point to the same thing: when AI goes from “occasional use” to “running every day,” the cost structure changes fundamentally. The flat monthly fee era is ending. Token inflation is here. Corporate budgets are getting incinerated. This isn‘t a future problem. It’s a now problem. And the people who started doing the token math first are turning that frugality into a competitive advantage.