UI Is HITL

May 7, 2026

A close-up of an airplane cockpit control panel with rows of analog instruments, switches, and dials

I keep watching people fight their AI tools by typing things they could have clicked.

Last week: a teammate spent two minutes describing to an agent which five of fifty product photos to use for a video. "The second one but not the third. Skip the white-background ones. Keep the lifestyle shot — no, not that one, the one with the cup." The same task on a grid of thumbnails with checkboxes would have taken eight seconds.

That gap is the whole post.

The dominant story right now says: as agents get smarter, the interface gets thinner. Eventually everything is chat. UI is the training wheels we'll discard once the model can really understand us.

I think that reads it backwards.

UI isn't legacy scaffolding to be replaced. UI is the highest-bandwidth way to keep a human in the loop. Forms, grids, sliders, canvases — these aren't the past of human–computer interaction. They're the cockpit through which we steer increasingly capable models. The builder's job isn't to remove UI. It's to design the right UI for each step.

A workflow that embarrasses chat

Take a workflow most creative teams now run on autopilot:

  • An agent visits a product page and scrapes the text plus every image it can find.
  • The page returns about fifty images: hero shots, lifestyle photos, logos, banners, partner badges, the founder's headshot, a stock illustration that snuck in from somewhere.
  • A human picks the five that should anchor a 30-second video.
  • The agent generates the clip from the picks.

Step 3 is the interesting one. It's a selection task: five out of fifty, fast, with taste.

In chat, you live in noisy reference. "The second one but not the third. Skip the white-background ones. Keep the lifestyle shot but not the one with the watermark. Actually wait, I changed my mind about the third." Every clarification is another round trip. Every reference — "the lifestyle shot" — has to be disambiguated against everything that could plausibly count. Errors compound. The model can't see what you mean by "lifestyle" the way you do.

On a grid with checkboxes, you click five times. Eight seconds. Zero ambiguity.

This isn't a quirk of one workflow. It's the pattern: for picking, pointing beats typing by an order of magnitude.

HITL is a bandwidth question, not a chat question

Human-in-the-loop is a control architecture. The human keeps a steering wheel — direction, taste, accountability — while the model does the heavy lifting in between.

We've quietly conflated "human in the loop" with "human types in a chat box," and they aren't the same thing. The steering wheel can be a button, a slider, a marquee selection, a checkbox grid, a draggable handle. The right one depends on the cognitive task.

A useful frame:

  • Typing beats pointing when you need to express intent — describing a goal, setting context, naming something new.
  • Pointing beats typing when you need to pick, rank, tune, or locate — choosing among options, ordering them, dialing a value, selecting a region.
  • Drawing beats either when you need to express spatial structure — a box, a path, a layout.

Real workflows interleave all three. A pure-chat agent forces every step through the typing channel, including the steps where typing is the worst possible option.

This is why "chat with my agent" feels great for the first thirty seconds and increasingly painful around minute three. Chat is a sequential, narrow, modality-poor pipe. It's a great pipe — for some steps. For others, it taxes you on every turn.

What chat alone can't carry

A few specific costs you pay when you flatten everything to text:

  • Round-trip latency. Each clarification is a turn. Three back-and-forths to nail "the third image, no, the fourth" is three turns of waiting on tokens.
  • Reference noise. "The red one with the kid" is ambiguous in any catalog with two red items and a kid. You either disambiguate (slow) or accept errors (worse).
  • No spatial primitives. You can't draw a box around a region in a chat box. You describe it, badly.
  • Out-of-distribution tasks. Visual selection, numeric tuning, ordering, region marking — these are not text. Forcing them through text is like using a phone keyboard to play a piano.

None of this is solved by a smarter model. The model isn't the bottleneck for these steps. The channel is.

A wall of polaroid photographs arranged in a loose grid, suspended from string

The HITL spectrum

It helps to see HITL surfaces as a spectrum, not a chat-vs-UI binary. Each layer trades some flexibility for bandwidth:

  • Pure chat. Maximum flexibility, lowest bandwidth per interaction. Good for novel goals and open-ended exploration.
  • Chat plus tool calls. The agent can act, but the human is still typing every input.
  • Chat with rich responses. Cards, image grids, buttons inside the assistant's reply. A first step toward higher bandwidth on the output side.
  • In-context UI. Artifacts, canvases, generated forms — temporary structured surfaces the model produces for a specific task.
  • Domain-specific app with embedded agent. A bespoke cockpit for a recurring workflow, with the model doing the heavy lifting between explicit user actions.

Each step up the ladder gives up some generality and buys back a lot of efficiency for that workflow. If you do a task once, chat is fine. If you do it a thousand times, you want the cockpit.

The mistake builders keep making is treating this as a binary. The right answer for almost every real product isn't one layer — it's a blend, picked per step.

Distribution moves up the stack

There's a strategic implication that follows, and it's the one I keep coming back to.

Frontier model capability is going to keep getting better, cheaper, and more uniformly available. That layer of the stack is on a steep depreciation curve for differentiation. Everyone gets the next tier within weeks of release.

The differentiation moves up the stack — to how the capability gets distributed. To the cockpit, not the engine.

The winning AI products of the next few years won't be the ones with a marginally better model. They'll be the ones who built the right cockpit for a specific workflow and put it into the hands of people who would never have touched a generic agent on their own. A purchasing manager isn't going to learn prompt engineering. A line cook isn't either. A regional sales VP isn't either. But they'll happily click checkboxes, drag sliders, and tap approve — if someone designed the right surface for the work they actually do.

This is what people mean when they say the model is the engine and the skill is the cockpit. Every domain-specific app, every custom skill, every embedded surface is a way to wrap raw capability into a shape a human can wield inside their working day.

Builders waiting for chat to swallow everything are betting against a hundred years of HCI research that says it won't.

A close-up of a sound mixing console with rows of knobs, buttons, and faders

AGI is a process; UI is how we cash it

The other reason I'm bullish on UI as a long-term primitive: AGI doesn't arrive — it accrues.

Capabilities unlock one shoulder at a time. Each model release moves the line of "what an agent can reliably do" forward by some amount. The builder's job is to keep finding the right interface for the capability we have today.

Each interface is a frozen layer of the current frontier, made productive. As capability rises, the interface evolves — more gets automated, less requires human input — but the human stays in the loop on the highest-stakes decisions for that workflow. The grid for image selection becomes a single "looks good" button once the model is reliable enough to pre-pick. The draft and review loop becomes review only when the model crosses a quality bar. The cockpit shrinks as the autopilot gets better. It doesn't disappear.

HITL isn't a transitional crutch we'll outgrow. It's the long-term architecture for any system where a human is accountable for the outcome. UI is how that architecture stays usable as capability climbs.

What this means for builders

Practical implications, sharp version:

  • Don't reflexively ship the chat for X. For each step, ask: what's the most efficient HITL surface here? It's almost never "all chat."
  • Identify the high-frequency steps. Those are the ones that deserve a custom surface. Selection, approval, ranking, parameter tuning — anywhere a user is going to do the same shape of action a hundred times, build the cockpit.
  • Reserve chat for the steps that earn it. Open-ended intent. Novel goals. Subjective explanation. Don't drag the rest of the workflow through it.
  • Treat the model as the engine, the UI as the cockpit, the workflow as the route. Three different problems. None substitutes for the others.
  • Stop apologizing for forms. A form, in 2026, is not a holdover from a less intelligent era. It's how a human steers an intelligent system at high bandwidth. A good form is a feature, not a debt.

The most leveraged piece of work in AI right now isn't another fine-tune. It's a domain-specific surface that takes a generic model and makes it productive in a specific person's day. That's UI work. HCI work. Product work. The frontier labs aren't doing it, and most of them aren't going to. That's the opening.

Closing thought

UI isn't how we interact with AI. UI is how we keep a human in the loop while AI does the work. As long as humans are accountable for outcomes, the cockpit isn't going anywhere — it just gets better.