Bootstrapping All the Way Up: Build the System That Builds the System

Posted June 8th, 2026 by Liv & filed under AI, Blogroll, Tech.

I wrote about this last week and I feel it deserved a deeper dive. Because the oldest trick in our trade is bootstrapping. You write a thing, and then you use that thing to build a better version of the same thing, and the better version builds a better one still. The C compiler is the canonical example: written in C, compiled by an older C compiler, used to compile the next C compiler. At some point in that lineage a human typed the very first version in assembly, and from then on the language was building itself.

We never stopped doing this. We just stopped noticing, because the loop ran slowly enough to look like ordinary work. AI didn’t introduce a new idea here. It took the idea we’ve been quietly running for fifty years and gave it a clock speed that changes what’s worth building.

A short history of self-reference in our tooling

Look back at the things we were proudest of building, and a pattern jumps out: almost all of them were systems whose output was leverage on the next system.

Self-hosting compilers. The compiler that compiles itself. Once it exists, every improvement to the language is an improvement to the tool that builds the language.
Code generation and scaffolding. From yacc and lex emitting parsers, to IDEs stamping out boilerplate, to ORMs generating data layers, we taught machines to write the code we were bored of writing.
Metaprogramming and macros. Lisp macros, C++ templates, Rust’s macro system: programs that write programs at compile time. Code as data, manipulated by code.
Build systems and CI. Make, Bazel, the entire CD pipeline. A system whose only purpose is to reliably turn source into shipped artefacts, over and over, without a human in the path.

Every one of these is the same move: take a layer of work the human was doing by hand, and hoist it into a system so the machine does it instead. We’ve been climbing this ladder the entire time. The work keeps moving up a level of abstraction, and at each level there’s less typing and more directing.

But there was always a ceiling. Every one of those loops still had a human standing in the inner loop, running at human speed. The compiler bootstrapped itself, but a person wrote the improvement. The scaffolder generated the boilerplate, but a person decided what to scaffold and fixed it when it was wrong. The leverage was enormous and the clock was slow. That combination is exactly what we’d been trained to accept as the shape of the job.

What actually changed

The thing AI closes is the inner loop. Not “writes code faster”, that’s the surface reading. The real change is that the loop of generate ? run ? observe ? correct ? repeat can now run without a human re-entering it on every turn.

That’s a categorical difference, not a quantitative one. Every previous tool in our history transformed input to output deterministically: same input, same output, and a human had to decide what the input should be and judge whether the output was right. The new loop puts judgment inside the loop. The system can look at a failing test, form a hypothesis, change the code, run it again, and decide for itself whether it got closer. That used to be the irreducibly human part. It’s now something you can put on the other side of the loop boundary.

So the unit of work changes. It’s no longer “write this function.” It’s “specify and supervise a loop that writes this function, checks it, and fixes it until it holds.” And once that’s the unit, the obvious next thought is the one this whole post is about: if I can build a loop that builds software, I can build a loop that builds loops.

The shape of the loop, concretely

Strip away the marketing and an agentic build loop is almost embarrassingly simple. I’ll describe the one I actually ship, because the simplicity is the point.

The Smart Scheduler in Calendrz is a manual for loop (capped, so it can’t run away) around a single model call. It sends the conversation plus a set of tool definitions, inspects the stop reason, executes whatever tool calls came back, appends the results, and loops again until the model is done. It fits in one file. I deliberately did not reach for an agent framework, because the framework would have bought me nothing except a debugging surface I didn’t want. When an agent misbehaves — and they all eventually misbehave — I want to read the loop top to bottom and see exactly where the weirdness entered.

The parts that make the loop good rather than just functional are the parts around the model:

The tools. The agent can list calendars, find free slots, create and update events. These are the actions it’s allowed to take in the world. The loop is only as capable as the verbs you give it.
The environment. It can act, observe the result, and react. A coding loop’s environment is the repo, the test runner, the logs. The agent that can run its own tests and read its own failures is the agent that can correct itself.
The feedback signal. Something has to tell the loop whether it’s winning. For the scheduler, it’s grounded calendar data and a confirmation step. For a coding loop, it’s tests passing, types checking, linters clean.

That last one is where everything actually lives, and I’ll come back to it because it’s also where everything breaks.

Building the system that builds the system

Here’s the shift that re-organises how you spend your time once you have one working loop.

You stop tuning outputs and start tuning the loop. When a generated function is wrong, the junior move is to fix the function. The leveraged move is to ask why the loop produced a wrong function and didn’t catch it — and then fix that. Maybe the agent needed a tool it didn’t have. Maybe the test suite didn’t cover the case, so the feedback signal was blind. Maybe the spec you handed it was ambiguous. Every one of those fixes improves not just this output but every future output the loop produces.

That’s the compounding. A fix to a piece of code helps once. A fix to the loop that writes code helps every time the loop runs, forever. So your real codebase stops being the product and becomes the harness: the evals, the tools, the feedback signals, the specs and prompts, the scaffolding that lets a loop run safely and tell good output from bad. The product falls out of a sufficiently good harness almost as a side effect.

This is exactly what the AI labs do, and it’s why their release cadence looks impossible from the outside. They are not hand-crafting each model the way you’d hand-tune a single function. They build the system that produces models — the data pipelines, the training infrastructure, the evaluation harnesses — and then they point their best people at making that system better. Every improvement to the system compounds across every future release. The exponential everyone keeps pointing at isn’t magic. It’s bootstrapping with a fast enough clock that the compounding becomes visible inside a single career, instead of across decades.

Where it breaks, and why that’s the new skill

I’d be writing hype if I stopped there, so here’s the honest part. The loop is only ever as good as its feedback signal. A loop with a weak signal doesn’t produce weak output — it produces confidently wrong output, fast, at scale. That’s worse, because it looks like progress.

If your tests are thin, the loop will write code that passes thin tests and fails in production. If your eval rewards the wrong thing, the loop will get spectacularly good at the wrong thing. The failure mode of this entire paradigm is a beautifully engineered machine sprinting confidently in the wrong direction, and you won’t notice until the bill or the bug report arrives.

So the skill that matters shifts. It moves away from “can you write the code” (the loop can write the code) and towards “can you design the signal that tells the loop whether the code is any good.” Writing a great test suite was always valuable; now it’s load-bearing, because it’s the thing steering an automated builder. Defining a sharp eval, choosing what the loop is allowed to do, knowing what good looks like well enough to encode it: that’s the work that doesn’t delegate. The taste was always the scarce input. Now it’s the only scarce input, and it’s pointed at a machine that will faithfully amplify whatever you give it, including your mistakes.

The mindset shift

For most of our history, the job was to write the system. That was the identity: I’m the person who builds the thing.

That’s a rung too low now. The job is to write the system that writes the system, and then to become the person who makes that system better at making itself better. You’re not climbing down to fix outputs by hand; you’re climbing up to improve the loop, and then up again to improve how the loop improves. It’s bootstrapping all the way up, and there’s no obvious top to the ladder.

The engineers who’ll pull away in the next few years aren’t the ones with the strongest opinions about which model is best. They’re the ones who internalised that their leverage moved up a level, sat down, built a loop, and then started building the loop that builds the loop. The compiler that compiled itself wasn’t a curiosity. It was the whole plan, and we finally have the clock speed to run it for real.

I’m building this way every day on Calendrz, and I’m still figuring out where the ceiling is (if there is one). If you’re running your own loops and finding the edges, I’d love to compare notes; that’s the most interesting conversation in software right now.

Liviu Tudor — Of Man and Internet

I’m a nobody, nobody is perfect, therefore I’m perfect.

A Random Thought

About Me

Technologist. Leader. AdTech. Advisor. Speaker.

Image

Interesting Sites

Me, Myself & I -- My Sites

Sites I Write For