What Should I Charge? AI as a Pricing Factor.

The software industry has spent decades measuring the wrong thing.

Not because developers are unproductive — but because measurement tools capture activity, not thinking. Hours worked. Commits pushed. Stories completed. All countable, all visible, and almost all irrelevant to understanding how much and how well a problem was actually solved.

AI didn't just arrive to accelerate development. It came with something unexpected: a new signal. Every working session with a language model leaves a quantifiable trace — token consumption — that for the first time allows a glimpse into the real cognitive effort behind a task.

The question no methodology answers well

If you develop software independently or lead a consulting team, there's a question that appears in every project and almost never has a satisfying answer:

How much should I charge for this?

It's not a business question. It's a technical question disguised as a business question. To answer it well, you need to estimate the real complexity of the problem — and that's where everything gets complicated.

The industry has tried many approaches. Man-hours penalize the best developers: whoever solves something in two hours charges half of what someone who takes four hours does. Story points work for internal planning but can't be converted into a price for an external client. Fixed price per feature puts all the uncertainty on the developer's side — and developers almost always underestimate.

They all share the same underlying problem: they're subjective approximations of something nobody has been able to measure objectively — the real complexity of a software problem.

AI is introducing something that might change that.

Why complexity is so hard to measure

Two tasks can look similar on the surface and be radically different in depth.

"Add a field to the registration form" sounds simple. But if that form connects to three legacy systems, has validations across four layers, and nobody documented the business rules, the real task has nothing to do with what it looks like from the outside.

Traditional metrics don't capture that. Hours measure elapsed time, not actual difficulty. Story points reflect a group estimate that depends on who's in the room and how well they know the system. Experience helps, but it doesn't scale — and it's not systematically transferable from one project to another.

What has always been missing is a signal that emerges from the work itself, not from the prior estimate.

Tokens: what they are and what they represent

When you work with an AI model, whether to generate code, review logic, plan architecture, or debug errors, each interaction consumes tokens. Simply put, a token equals a fraction of a word: approximately 1–2 tokens per word in English. In other words, what tokens reveal is how much context a problem needs to be understood, how many iterations it requires before reaching quality responses, how many adjustments and corrections accumulate along the way. It's not activity — it's real cognitive effort, quantitatively visualized.

Concrete example: generating the initial schema for a database might consume around 2,000 tokens. Debugging a concurrency error in a distributed system can reach 50,000. That difference isn't arbitrary — it reflects real complexity.

The emerging pattern: complexity and consumption

With enough working history with AI, a correlation emerges that's not perfect but is significant:

Level	Task type	Estimated tokens	Examples
1 – Operational	Low uncertainty	500 – 2,000	Basic CRUD, simple scripts, style adjustments
2 – Functional	Medium variability	5,000 – 20,000	API integrations, modules with business logic
3 – Systemic	High uncertainty	20,000 – 100,000+	Architecture, complex debugging, deep refactoring

This isn't a fixed rule. A poorly structured prompt can waste 30,000 tokens on something operational. But with your own history, these bands adjust and become predictable.

The three levels in detail

Level 1 – Operational: Low uncertainty. The problem is well-defined and the solution has a clear path. The AI executes with little additional guidance. In pricing terms, this level is the easiest to quote because variability is small. The risk of underestimating is low.

Level 2 – Functional: Medium variability. There are design decisions involved and the system context is required. The AI proposes alternatives and the developer guides, discards, adjusts. Here token consumption starts to reflect something valuable: decision iterations. Each round in the AI conversation is a micro-decision that was invisible in the hourly model.

Level 3 – Systemic: High uncertainty. The problem is poorly defined, involves multiple systems, or has undocumented dependencies. The developer actively guides exploration, and work extends across multiple sessions. This is the hardest level to quote with traditional methods — and where tokens provide the most value as a signal.

A senior developer consumes fewer tokens on Level 3 problems (code development) because they have more experience and probably more techniques to frame the problem from the start. That doesn't invalidate the model — the signal is still valid, but the history has to be yours, not someone else's.

From tokens to price: using your history

The real value of measuring token consumption isn't in a single task — it's in the accumulated history.

Suppose you've been tracking consumption per completed task for three weeks:

Week 1: 180,000 tokens for 8 tasks → average ~22,500 per task
Week 2: 210,000 tokens for 11 tasks → average ~19,000 per task
Week 3: 195,000 tokens for 10 tasks → average ~19,500 per task

You have a measurable productive capacity: ~200,000 tokens per week, ~20,000 per task on average. If a new project arrives with 12 features, you can classify them by level and project the expected total consumption against your history.

It's not an exact formula, but it's a quantifiable basis that didn't exist before. More importantly: when a project ends up consuming twice the estimate, you have concrete data to understand why — and to quote better next time.

The model that changes: from time to resolved value

If the complexity of solving a problem can be measured more precisely, the traditional pricing model for that work changes completely.

Traditional model

Price = hours × rate

This model has a fundamental problem: it penalizes efficiency. Whoever solves something in 2 hours charges less than someone who takes 8, even though the value delivered is identical.

New approach

Price = complexity × resolution capacity

Complexity is determined by the problem. Resolution capacity comes from the developer — including the tools they use and how well they use them.

Under this model, a Level 3 problem has a price that reflects its real difficulty, regardless of whether it's solved in 6 hours with well-used AI or in 6 days without it. The client pays for the problem that gets solved, not the time it takes to solve it.

Tokens don't implement this model on their own, but they provide the evidence that makes the pricing conversation more solid.

What tokens don't solve

Not all consumption reflects value. A poorly framed prompt generates many tokens with little result. If you're learning to use AI or exploring unknown territory, consumption will be high even if the output doesn't justify it. The signal has noise — and that noise takes time to calibrate.

Experience remains the most important variable. A senior developer solves Level 3 problems with fewer tokens than a junior, not because the problem is simpler, but because they know how to frame it better from the start. Tokens measure visible work — not the quality of the prior thinking that makes it possible.

AI models evolve rapidly. What requires 30,000 tokens today may be solvable with 8,000 in six months thanks to more efficient models. The history isn't static — it needs updating as tools improve.

It doesn't replace the conversation with the client. Ultimately, pricing is also a negotiation. Token data provides arguments, not automatic answers.

Open questions for your team

Some of these questions don't have easy answers, and that's fine:

Are you measuring the complexity of your projects before quoting them, or do you quote and then discover the complexity?
How much of your current pricing is based on your own data versus accumulated intuition?
What will happen to the hourly model when clients understand that AI can compress days into hours?
Should a project's price reflect the time invested or the problem solved?
Does it make sense, in your specific context, to start measuring tokens now — or are there other signals already telling you what you need to know?

Conclusion

"How much should I charge for this?" will remain a difficult question. Software complexity doesn't disappear just because we now have better tools.

But for the first time there's a signal that emerges from the work itself, not from the prior estimate or what the client described (or didn't describe) in the initial meeting. Token consumption leaves a trace of the real cognitive effort involved in solving a problem.

It's not a formula. It's the information that starts accumulating from the first task you track — and that in a year turns "how much should I charge?" into a question you can answer with your own data.

What Should I Charge? AI as a Pricing Factor.

The question no methodology answers well

Why complexity is so hard to measure

Tokens: what they are and what they represent

The emerging pattern: complexity and consumption

The three levels in detail

From tokens to price: using your history

The model that changes: from time to resolved value

What tokens don't solve

Open questions for your team

Conclusion

What exactly are tokens in an AI model?

How can I track my token consumption?

Do input and output tokens count the same?

Does token consumption vary by AI model?

Is it viable to charge clients directly by token consumption?