The Real Win Isn't Faster Output. It's Cheaper Context-Switching. (Part 2 of 6)

TL;DR: The studies on AI and productivity look contradictory until you notice they're measuring one task at a time. That's not how I work anymore, and it's not how most knowledge workers will work soon either. The gain isn't that task A goes faster. It's that you handle A, B and C in the time it used to take to do A alone. And the math should land at the company level, not the individual level. Diversity of working styles is a feature, not a problem to optimize away.

blog-chronicle-part-2-1

This is part two of a six-part series. Part one is here if you missed it.

In part one I said the biggest change AI brought me wasn't speed on a single task. It was the collapse in the cost of switching context, the mental tax of jumping between tasks. Let me make that concrete, because this is where the research gets interesting, and where people start talking past each other.

The studies disagree, and that's the clue

The studies aren't tidy. The clearest data we have right now is from developer work, because developers are easy to measure and the tooling matured there first. But the pattern shows up wherever knowledge work involves many threads.

GitHub found in 2024 that developers with AI tools shipped more functional code. Nice and encouraging.

But when METR put experienced developers to work on real tasks in their own codebases, they came out 19 percent slower, even though the developers themselves believed they were faster.

That second one stings, and people love to wave it around as proof that AI is overhyped. But here's the thing. METR measured one task at a time. That is not how I work anymore, and I don't think it's how anyone using AI seriously works either, whether you're writing code, briefs, slide decks, contracts or analysis.

There's also a third story worth holding in mind, because it sits in the gap between the two studies. A team at SpareBank 1 Utvikling went "Claude-first" for a few months in their pair-programming and TDD work, then deliberately pulled back. Their feedback loops got slower, their critical thinking flattened, and they noticed they were drifting into clicking "apply" on suggestions without thinking. They didn't reject AI. They redesigned the loop, kept AI where it added clear value, and went back to manual work where the tight feedback cycle was the actual product. Asgaut Mjølne Söderbom wrote about this in a really honest LinkedIn post that's worth a read.

That story matters because it shows the productivity question is rarely just about speed on a task. It is about whether the tool fits the process you actually need to run. AI didn't make them slower as such. "AI first as the default driver" made them slower in a context where their existing feedback loops were doing real work. Different setup, different answer.

blog-chronicle-part-2-2

How I actually work now

I kick an agent off on one task: a draft, a code change, a research scan, a slide outline. I think through the next one. I check the result of the first when I naturally have a free moment. The win is not that task A finishes faster. The win is that I'm handling A, B and C in the time it used to take me to grind through A alone.

Measuring a single task and concluding AI doesn't help is a bit like timing one lap and concluding the relay team got slower. Sure, maybe the lap was slower. But you're watching the wrong race.

blog-chronicle-part-2-3

So the question I care about isn't "did this one task go faster?" It's "how many threads can I keep warm at once without my brain melting?" And the answer, for me, went up a lot.

Worth flagging before we go further: that "for me" is doing real work in that sentence. I'll come back to it in part three, because not everyone gets a calmer workday from running three threads in parallel. Some people get faster, and more stressed, at the same time. That gap is worth taking seriously.

Does individual gain become employer gain?

A fair question many people ask: does an individual AI gain actually turn into a gain for the employer? For me the answer is simple. Either fewer people do the same work, or we raise both the volume and the bar. More done, and better, because we spend our time on the substance instead of the boring parts. Both are happening right now.

That said, I don't think the goal is to turn everyone into a "10x" or even a "5x" version of themselves. The multiplier on any one person isn't really what matters. What matters is that the value of AI use across the organization comfortably covers the cost of the tools across the organization. The math should land at the company level, not at the individual level. Some people will get massive leverage from AI. Some will get a steady small lift. A few may get very little for now. That's fine, as long as the net across the team or the company is positive. Diversity of working styles is a feature, not a problem to optimize away.

What does my own usage actually cost?

Let me put a number on it, because I can. I work at Microsoft, where token usage is effectively unlimited internally, at least for the time being. Out of curiosity I checked what I'd be paying at consumer list pricing. On a heavy month the answer lands around USD 4,000. That's roughly USD 48,000 a year in model usage for one person.

To be honest, that monthly number has a bit of inflation in it. I know I'm running more expensive models than I strictly need (more on that below). When I look at the actual week-to-week variation, it sits somewhere between USD 500 and 1,500. It moves with what I'm working on and how many days I'm actually at the keyboard.

Now, would my true value justify that spend? I think for the work I do, yes, fairly comfortably. If those tokens let me run three threads where I used to run one, the bar to clear isn't "is 4k a lot of money" (it is), it's "does the multiplier return more than an FTE's worth of extra output." For cross-context work like mine, it usually does.

The trend is what surprised me most. My consumption grew about 2.5x from March to May, almost perfectly linear. That sounds alarming until you ask where it's headed. I suspect I'm closer to a plateau than a runaway curve, because these days I'm more a document writer than a coder. Writing and reasoning over text has a natural ceiling on how many parallel agents are useful. You can only steer so many drafts at once before you're the bottleneck again. Tasks that fan out cleanly (code changes across files, batch analysis, scripted research) scale differently. You can run many in parallel almost indefinitely. Prose pulls back toward one careful human holding the thread.

But here's the honest caveat, and it matters. I spend that much partly because I can. I run high thinking and large context on basically every call, top model every time, because nothing internally tells me not to. If I switched to an auto-model router that picks a cheaper model for the easy 80 percent of tasks, I'd almost certainly slash that bill with very little loss in quality. So a heavy 4k month is closer to a ceiling on what I happen to use than a floor on what my value actually requires.

And I'm well aware I'm lucky here. Having no cap on spending is a privilege, not a default, and I don't take it for granted. It also may not last. What's unlimited for me today could become metered company-wide tomorrow, the moment someone does the math across thousands of employees. The day a budget shows up, the "because I can" habits get expensive fast, and the auto-model question stops being a curiosity and becomes the whole game.

Meanwhile, in some places the incentive runs completely the other way. Mostly US teams, from what I hear. People bragging about topping internal token leaderboards. Burning the most credits as if that were the achievement. There's even a name for it now, tokenmaxxing, the same family as looksmaxxing and every other -maxxing trend the internet keeps generating. The maxxing-of-everything mindset caught up with AI, and the result is people treating consumption as if it were the contribution. That's the inverse of efficient. It looks like adoption theater dressed up as productivity.

blog-chronicle-part-2-4

Get tokenmaxxing t-shirts at https://www.etsy.com/shop/ContextWindowShop

The flip side, in parts of Europe, is the opposite extreme. Worker councils block employers from looking at individual token usage at all, on the grounds that it could be repurposed as performance monitoring. I get the intent. I also think both ends are missing the point. What I produce versus what others produce is the measurement, not how many tokens it took to produce it. Different tools for different use cases, different jobs, different people, the same diversity argument as elsewhere in this series. The leaderboard is the wrong leaderboard.

There's a sharper way to frame what the leaderboard gets wrong. In a June 2026 note on what he calls the frontier ecosystem, Satya Nadella (CEO Microsoft) splits a firm's capability into human capital (the knowledge, judgment and pattern recognition of its people) and token capital (the AI capability it builds and owns). The useful bit, for this argument, is that token capital is something you build and own, not something you burn. Tokenmaxxing optimizes for spend. The thing actually worth compounding is the learning loop on top of the models: the workflows, evals and accumulated judgment that make each token you spend more useful than the last. Measured that way, the heaviest user on the leaderboard might be the one creating the least durable value, because consumption that doesn't feed back into a better system is just exhaust.

There's a less comfortable version of "because I can" too, and I'd be dishonest to skip it. Tokens aren't free in energy or water, even when they're free to me. High thinking and large context on every call has a footprint, and right now I'm optimizing for my own productivity while the wider environmental cost sits on someone else's ledger. Me winning, world maybe not. I don't have a clean answer to that. If I'm honest, I'm partly relying on control and regulation, and on providers getting more efficient, to put limits where my own restraint doesn't. That's not flattering to admit, but it's true. A price signal, a cap, a budget, a carbon cost, would probably make me a more thoughtful user overnight.

I'll also say this, because I think it's honest: I would much rather burn GPU cycles on actual knowledge work than watch them go to crypto mining farms that have feasted on cheap power in places like Norway for years. That's energy straight into the bin, in my opinion. I'm no crypto fan, and there is no point pretending otherwise. AI at least produces something the people paying for the work value. The footprint conversation matters, but it should not pretend all GPU cycles are equal.

The numbers back the framing, even if they don't decide the argument. IEA's Electricity 2024 report bundles data centres, AI and crypto together at about 460 TWh globally in 2022, projected to more than 1,000 TWh by 2026, roughly Japan's annual electricity use. Cambridge's Bitcoin index puts Bitcoin alone at around 150 TWh per year, roughly a third of all data centre power, sustained for fifteen years and counting. Bitcoin produces a price chart. AI use at work produces drafts, code, summaries, decisions that the people paying for the tokens are choosing to pay for because they get something useful back. I'm not pretending blockchain itself is worthless as a technology, but proof-of-work mining at this scale is, for me, a strange way to "create" value. I'd rather see those terawatt-hours go somewhere that produces output a human actually uses.

blog-chronicle-part-2-5

The trend on the supply side is encouraging though, and worth taking seriously. Look at what's happened with image models recently: quality has gone up sharply while model size has gone down. The same shift is starting to show up across the board. Smaller, more specialized models, automatically routed to the right task, will balance GPU cycles much better than today's "throw the biggest model at everything" default. More value per token spent. Less footprint per unit of useful work. The economics and the environmental math both improve at the same time, which is rare in tech.

Which loops straight back to the employer question. The gain is real, but so is the waste. The interesting work isn't proving the spend is worth it. It's figuring out how much of it was ever necessary.

Why this matters for the series

I'm spending a whole post on this because the "context-switch tax" is the hinge for everything that follows. If AI mostly lowers the cost of juggling many contexts, then the people who benefit most are the ones already juggling many contexts. And the people who feel the squeeze are the ones who relied on stable, single-track work.

That's not a neutral, evenly spread gift. It tilts the field.

Which is exactly where part three goes: who AI actually favors, and who it quietly squeezes.

See you in the next one :)