Human in the Loop: When to Replace, When to Keep, and Why (Part 6 of 6)

June 22, 2026

Human in the Loop: When to Replace, When to Keep, and Why (Part 6 of 6)

TL;DR: "Human in the loop" is repeated like a magic spell, but most teams have never asked what the human is actually for. Sometimes it's judgment, trust or legal sign-off. Sometimes it's habit. The interesting work is asking that question honestly for each loop, and shrinking the role to where it actually matters. Three breast cancer screening studies (Denmark, UK, Sweden) show what this looks like when it goes right, and they're a reminder that human-in-the-loop covers all AI, not just chatbots. The wider authorship question matters too: AI training, AI use, and AI imitation are three different debates that keep getting collapsed into one.

Part six of a six-part series. Previously: consultants and what survives.

In the first five posts I argued that AI mostly removes friction and changes who the work flows to. The natural next question is more practical. Where in my own processes is a human still needed, and where is the human just there because no one has revisited the design?

This post is about asking that question on purpose.

The Danish radiology case

In Denmark, breast cancer screening used to be read by two radiologists independently for every single mammogram. After roughly three years of using an AI system (Transpara) in the Capital Region and Region of Southern Denmark, the setup looks very different. The AI now triages the screenings. Around 70% of the lowest-risk mammograms get read by a single experienced radiologist instead of two. The remaining high-risk cases still go through the original double-read process.

The published outcomes are worth pausing on. More cancers detected. Fewer false positives. Radiologist workload cut by roughly a third to a half, depending on which study you read (University of Copenhagen 2024, Nature Medicine May 2025, and the earlier Lauritzen et al. work in Radiology 2022).

A fresh UK result backs the same pattern at much larger scale. Kelly et al., published in Nature in March 2026, evaluated a Google AI system on 175,000 women across NHS sites, the largest such study to date. AI detected more invasive cancers, recalled fewer women on their first scan, and crucially had higher specificity, meaning fewer false positives. The earlier Swedish MASAI randomised trial in The Lancet02464-X/fulltext) (January 2026) landed in the same place: same specificity as standard double reading, higher sensitivity, fewer interval cancers with bad characteristics, less reading workload. Three independent settings, three studies, the same shape of result. Specificity matters more than people think. False positives mean recall letters, biopsies, weeks of fear and real downstream cost. If AI reduces them while also catching more real cancer, the gain is much larger than "AI is as good as a radiologist" makes it sound.

This is the version of "human in the loop" I find honest. The humans did not disappear. The loop got redesigned around what only humans should do, and AI took the rest.

Not all AI is a chatbot

It's worth saying plainly, because the current hype makes it easy to forget: the cancer screening systems above are AI, but they are not language models. They are predictive models trained to do one narrow thing extremely well, and "human in the loop" applies to them just as much as it applies to the chatbot drafting my emails. The principle spans the whole field, not just the part that talks back to you.

The distinction matters for a practical reason too: cost. Generative AI, the large language and image models, is expensive. It burns a lot of GPU time and electricity for every answer. Predictive AI of the kind used in medical imaging is usually cheap by comparison, small enough that it can often run on a local laptop. Different tools, different footprints, different jobs.

Yet we increasingly reach for the generative model even when the task is deterministic, something a small script or a classic predictive model would handle faster, cheaper and more reliably. We do it because the chatbot is right there, one prompt away, and the deterministic option takes a little setup. Availability and ease of use win, even when they are the wrong tool for the job. That is worth noticing, because the easy default is not always the responsible one.

Why is the human actually there?

When I look at processes around me, the answer to "why is there a human here" usually falls into one of four buckets:

Judgment. Hard tradeoffs that depend on values, context or risk appetite.
Trust. A customer, regulator or colleague needs a named human to vouch.
Legal or compliance. A signature, an audit trail, an accountable owner.
Habit. No one has revisited this loop in years.

The first three are real. The last one is where most of the easy wins live, and where most teams get stuck because nobody is allowed to ask the question out loud.

For each loop in your process, I think it's worth being honest. Why is the human there? If the answer is "habit," that's a candidate for redesign. If the answer is "judgment, trust or legal," then the human should stay, and the interesting question becomes how much of the surrounding work AI can take off their plate so the human can focus on the part that actually needs them.

That's what happened in Denmark. The human radiologist is still the one signing off. AI just removed the read that was happening twice for no diagnostic gain.

Adapting to the process, not forcing it

A small caveat that I keep getting reminded of in real conversations: you can't always shove AI into an existing process and expect it to work. Sometimes the process has to flex. Sometimes AI has to flex.

I was in a conversation recently with a team that tried going all in on AI-first coding for a few months and decided to dial it back. Their reasoning was good. In their world, the existing tight feedback loops of pair programming and TDD were doing real work, and treating AI as the default driver actually weakened the engineering culture. So they backed off to "AI where it makes sense" rather than "AI first."

I do not read that as a defeat for AI. I read it as a team that did the actual work of figuring out which loops the AI was making better, and which loops it was breaking. That is exactly the right question to land on, even if very few teams ask it up front. Most discover it the long way around, by trying everything, seeing what sticks, and quietly paying for what doesn't. The shotgun approach has a real cost. You just rarely see it on the same invoice as the AI tooling.

You will find places where AI fits naturally, places where you need to change the process to let it fit, and places where the human-led loop is genuinely the better answer for now. All three are valid outcomes.

Brain cycles vs GPU cycles

Here is a pattern I keep running into, and it is the one that frustrates me most.

A recent example from my own week: we needed an API change to support a particular cohort of users. Writing the one-pager to explain the scenario took minutes with an agent and my own input. Then it sat. Engineering would not look at it until product management produced a proper spec. PM would not write the spec until leadership signed off. The artifact AI helped me produce in minutes was now behind weeks of human coordination.

This is not a story about AI failing. The AI did exactly what I needed it to do. The bottleneck is org design, and AI cannot route itself around an approval chain that was built for a different speed of work.

The lesson, if you are trying to get company-wide gain from AI tooling, is to split your sign-off junctions into two piles. One pile is the real regulatory boundaries, the genuine ship blockers, the things where the sign-off exists because something will actually break (legally, financially, safety-wise) if it doesn't. Those stay. The other pile is everything else: gates that exist because they existed last year, reviewers who got added when a different VP cared, ceremony that no one remembers approving. Those are the candidates for redesign.

Saving an hour on the artifact does not matter if the artifact then waits three weeks for a signature.

Getting people to spend brain cycles on the right question, instead of having a GPU spend cycles on the wrong one, is inherently the harder problem.

Turning frustration into opportunity

A few moves help. Name the bottleneck out loud, because if AI makes the artifact ten times faster, the slowness of the next step is suddenly visible in a way it wasn't before. Separate decisions from rituals, and treat them differently. Match the artifact to the audience that has to act on it, since AI can produce a one-pager and a full spec for the same price. And make it easier to say yes than to say no by bringing the spec, the impact, the alternatives and a draft of the answer you're hoping for.

None of this fixes the org chart. But it does turn each blocked artifact into evidence about where the org is structurally slow, and a small chance to make the next round less painful.

If AI ends up doing anything important for organizations, I suspect it will be this. Not replacing the humans in the loop, but making it impossible to ignore where the loops themselves are the problem.

The low-risk on-ramp: meeting summaries

If you want a concrete loop where the human role can shrink safely, it's meetings. AI-generated transcripts and summaries are easy to verify against your own memory, the cost of a small error is low, and the time saved (note-taking, recap emails, late joiners catching up) is real and immediate. It's also a good practice case: if your team is nervous about handing anything to AI, this is the safest place to build the muscle of "AI proposes, human verifies, human owns the output."

A small checklist for each loop

When I look at one of my own processes and ask whether AI can take part of it, I find it useful to run through these questions:

Why is the human in this loop? Judgment, trust, legal, or habit?
What is the cost of a small error? Low, recoverable, or catastrophic?
How easy is it to verify the AI output? Can I check it in seconds, or does it require deep review?
What is the ROI of changing this loop? Saved money, saved time, better quality, better employee experience, or just "looks modern" (or worse, "see, I spent AI credits, so I must be effective")?
Can I take one small step instead of redesigning everything at once? What is the smallest piece of this loop I could hand to AI today and review tomorrow?

If a loop fails the first question (the human is mostly there out of habit), and passes the second and third (low cost of error, easy to verify), it is a strong candidate. If it fails the second (catastrophic if wrong) or third (hard to verify), keep the human firmly in place, and instead look at what surrounds the loop.

I am still the author

One last thing, because it ties the whole series together for me.

I have written all six of these posts using AI as part of my flow. Outlines, drafts, sharpening, even catching weak transitions. I have also adapted the output to my own voice, grounded in hundreds of previous blog posts I've written over the years. But every claim, every sentence and every decision about what stays and what goes is mine. I read everything. I cut what I disagree with. I own the result.

The same thing is true for my day job. My name is still on the spec documents I write. My name is still on the code changes I push. Whatever the agent did along the way, the artifact has a human owner, and that owner is me.

There is a flipside to that, and it is one I should be honest about. The models I rely on were trained on a huge amount of writing other people produced. A book I co-wrote, Working With Microsoft FAST Search Server 2010 for SharePoint, sits on the Anthropic copyright settlement works list. I'm not personally a class member there because the rights belong to the publisher, not me, but it's a useful concrete example because that book was clearly used as training material without explicit consent at the time.

I'm not going to pretend the legal questions here are settled, because they aren't, and a lot of authors feel, with reason, that something was taken from them. My own position is more pragmatic than principled. I lean toward "if it's on the open internet, it's fair game for learning," and I'm fine with the trade where work I helped produce contributed to something millions of people now use to think and write better. I'm a sharing-is-caring person. I always have been. That said, training really should move toward licensed or otherwise lawful material wherever it can. The trajectory matters. Meta's Llama models were largely trained on user data under terms people technically agreed to in the EULA when they signed up for Facebook and Instagram, which is at least a clearer legal footing, even if "consent buried in a 12,000-word document" is a thin definition of consent. Llama and DeepSeek are also open weights, which gives something back to the wider community in a way that closed-source providers like Anthropic and OpenAI do not. The conversation should not just be about what was trained on, but also about who benefits from the result.

Where I draw the line is on use, not on training. I use these tools to think, draft, sharpen and ship work that is mine. I do not use them to impersonate another author's voice on demand, to clone a living artist's style for commercial output, or to mass-produce work designed to look like someone else's. The training argument and the imitation argument are not the same argument, and I think conflating them is part of why this debate keeps getting stuck. Training broadens what the tool can help with. Imitation-on-demand is where I think real harm starts.

That is, in the end, what I think "human in the loop" actually means. Not a checkbox. Not a ritual. Someone is responsible for what comes out the other side, and that someone is named, and that someone is human.

If your process has loops where no human ends up actually owning the output, that is the problem worth fixing. Whether AI shows up before or after, that loop was broken already.

Thanks for sticking with me through six posts. The question I started with was who the work flows to now. The question I'm ending with is who the work flows back to for accountability. I think those are the two questions that matter most for the next few years.

Stay curious :)

Search This Blog