The software industry has spent decades measuring the wrong thing.
Not because developers are unproductive — but because measurement tools capture activity, not thinking. Hours worked. Commits pushed. Stories completed. All countable, all visible, and almost all irrelevant to understanding how much and how well a problem was actually solved.
AI didn't just arrive to accelerate development. It came with something unexpected: a new signal. Every working session with a language model leaves a quantifiable trace — token consumption — that for the first time allows a glimpse into the real cognitive effort behind a task.
The question no methodology answers well
If you develop software independently or lead a consulting team, there's a question that appears in every project and almost never has a satisfying answer:
How much should I charge for this?
It's not a business question. It's a technical question disguised as a business question. To answer it well, you need to estimate the real complexity of the problem — and that's where everything gets complicated.
The industry has tried many approaches. Man-hours: reasonable in theory, unfair in practice because it penalizes efficiency. Story points: useful for internal team planning, but hard to translate into a price for a client. Fixed price per feature: comfortable for the client, risky for the developer.
They all share the same underlying problem: they're subjective approximations of something nobody has been able to measure objectively — the real complexity of a software problem.
AI is introducing something that might change that.
Why complexity is so hard to measure
Two tasks can look similar on the surface and be radically different in depth.
"Add a field to the registration form" sounds simple. But if that form connects to three legacy systems, has validations across four layers, and nobody documented the business rules, the real task has nothing to do with what it looks like from the outside.
Traditional metrics don't capture that. Hours measure elapsed time, not actual difficulty. Story points reflect a group estimate that depends on who's in the room and how well they know the system. Experience helps, but it doesn't scale — and it's not systematically transferable from one project to another.
What has always been missing is a signal that emerges from the work itself, not from the prior estimate.
Tokens: what they are and what they represent
When you work with an AI model — whether to generate code, review logic, plan architecture, or debug errors — each interaction consumes tokens. Simply put, a token equals a fraction of a word: approximately 1–2 tokens per word in English.
But beyond the technical definition, what matters is what tokens represent in practice:
- The context the problem requires — how much you need to explain to the AI to get it to understand what to solve
- The depth of reasoning — complex problems generate longer responses and require more iterations
- The number of refinements — each correction, adjustment, or backtrack adds tokens
Concrete example: generating the initial schema for a database might consume around 2,000 tokens. Debugging a concurrency error in a distributed system can reach 50,000. That difference isn't arbitrary — it reflects real complexity.
The emerging pattern: complexity and consumption
With enough working history with AI, a correlation emerges that's not perfect but is significant:
| Level | Task type | Estimated tokens | Examples |
|---|---|---|---|
| 1 – Operational | Low uncertainty | 500 – 2,000 | Basic CRUD, simple scripts, style adjustments |
| 2 – Functional | Medium variability | 5,000 – 20,000 | API integrations, modules with business logic |
| 3 – Systemic | High uncertainty | 20,000 – 100,000+ | Architecture, complex debugging, deep refactoring |
This isn't a fixed rule. A poorly structured prompt can waste 30,000 tokens on something operational. But with your own history, these bands adjust and become predictable.
The three levels in detail
Level 1 – Operational: Low uncertainty. The problem is well-defined and the solution has a clear path. The AI executes with little additional guidance. In pricing terms, this level is the easiest to quote because variability is small. The risk of underestimating is low.
Level 2 – Functional: Medium variability. There are design decisions involved and the system context is required. The AI proposes alternatives and the developer guides, discards, adjusts. Here token consumption starts to reflect something valuable: decision iterations. Each round in the AI conversation is a micro-decision that was invisible in the hourly model.
Level 3 – Systemic: High uncertainty. The problem is poorly defined, involves multiple systems, or has undocumented dependencies. The developer actively guides exploration, and work extends across multiple sessions. This is the hardest level to quote with traditional methods — and where tokens provide the most value as a signal.
This model doesn't replace experience — it complements it. A senior developer navigates Level 3 with fewer tokens because they know how to frame the problem correctly from the start.
From tokens to price: using your history
The real value of measuring token consumption isn't in a single task — it's in the accumulated history.
Suppose you've been tracking consumption per completed task for three weeks:
- Week 1: 180,000 tokens for 8 tasks → average ~22,500 per task
- Week 2: 210,000 tokens for 11 tasks → average ~19,000 per task
- Week 3: 195,000 tokens for 10 tasks → average ~19,500 per task
You have a measurable productive capacity: ~200,000 tokens per week, ~20,000 per task on average. If a new project arrives with 12 features, you can classify them by level and project the expected total consumption against your history.
It's not an exact formula, but it's a quantifiable basis that didn't exist before. More importantly: when a project ends up consuming twice the estimate, you have concrete data to understand why — and to quote better next time.
The model that changes: from time to resolved value
All of the above points toward a deeper change in how software development is charged.
Traditional model
Price = hours × rate
This model has a fundamental problem: it penalizes efficiency. Whoever solves something in 2 hours charges less than someone who takes 8, even though the value delivered is identical.
New approach
Price = complexity × resolution capacity
Complexity is determined by the problem. Resolution capacity comes from the developer — including the tools they use and how well they use them.
Under this model, a Level 3 problem has a price that reflects its real difficulty, regardless of whether it's solved in 6 hours with well-used AI or in 6 days without it. The client pays for the problem that gets solved, not the time it takes to solve it.
Tokens don't implement this model on their own, but they provide the evidence that makes the pricing conversation more solid.
What tokens don't solve
It would be dishonest to present this as a complete solution. There are real limitations worth naming:
Not all consumption reflects value. A poorly framed prompt generates many tokens with little result. If you're learning to use AI or exploring unknown territory, consumption will be high even if the output doesn't justify it.
Experience remains the most important variable. A senior developer solves Level 3 problems with fewer tokens than a junior, not because the problem is simpler, but because they know how to frame it better from the start. Tokens measure visible work — not the quality of the prior thinking that makes it possible.
AI models evolve rapidly. What requires 30,000 tokens today may be solvable with 8,000 in six months thanks to more efficient models. The history isn't static — it needs updating as tools improve.
It doesn't replace the conversation with the client. Ultimately, pricing is also a negotiation. Token data provides arguments, not automatic answers.
Open questions for your team
The goal of this article isn't to close the debate — it's to open it with better questions:
- Are you measuring the complexity of your projects before quoting them, or do you quote and then discover the complexity?
- How much of your current pricing is based on your own data versus accumulated intuition?
- What will happen to the hourly model when clients understand that AI can compress days into hours?
- Should a project's price reflect the time invested or the problem solved?
- Are you tracking your token consumption today, or letting data pass that could inform better decisions tomorrow?
Conclusion
"How much should I charge for this?" will remain a difficult question. Software complexity doesn't disappear just because we now have better tools.
But for the first time there's a quantifiable signal that emerges from the work itself — not from the prior estimate, not from intuition, not from what the client said in the initial meeting. Token consumption leaves a trace of the real cognitive effort involved in solving a problem.
It's not a formula. It's a starting point for building the history you don't have today — and that a year from now could be the difference between quoting with confidence or continuing to guess.