What Extreme Programming taught us about collaboration — and why it maps perfectly onto working with LLMs

The current conversation is missing something

If you spend any time in developer circles right now, the conversation about AI coding tools tends to collapse into one of two camps. The first camp believes we are months away from AI replacing software developers entirely — that the role of the human is already vestigial, a temporary inconvenience on the road to full automation. The second camp pushes back hard, arguing that AI is little more than a sophisticated autocomplete — useful for boilerplate, dangerous if trusted, and nowhere near capable of producing anything a senior engineer couldn’t do faster with a clear head and a good keyboard shortcut.

Both positions are wrong. Not subtly wrong — fundamentally wrong. And the gap between them is where something genuinely interesting is happening.

The developers who are getting the most out of AI coding tools are not the ones treating the AI as an oracle to be prompted, nor the ones dismissing it as a toy. They are the ones who have stumbled onto — or deliberately adopted — a different mental model entirely. One that has a name, a history, and a body of practice behind it. One that the software industry actually worked out decades ago, in a different context, for different reasons.

It’s called pair programming. And it turns out it maps onto working with an LLM almost perfectly.


A brief primer on XP pair programming

Extreme Programming — XP — emerged in the late 1990s as a reaction to the bloated, process-heavy software development methodologies of the time. It was opinionated, practical, and in many ways ahead of its time. Among its practices, pair programming stood out as the one that generated the most scepticism from people who had never tried it, and the most loyalty from people who had.

The premise is simple: two developers, one keyboard, one screen. But the premise is also misleading, because it implies the value is in the typing — that you are getting two people’s worth of keystrokes for the price of one. That is not what pair programming is about. In fact, that framing completely misses the point.

XP described two roles in a pairing session. The driver holds the keyboard and writes the code. The navigator holds the broader picture — watching for mistakes, thinking about what comes next, asking the question the driver is too focused to ask. Neither role is superior. Neither is passive. The navigator is not watching. The navigator is thinking, out loud, in dialogue with the driver. The value is in that continuous conversation — the shared context that builds between two people working through a problem together.

What pair programming actually produces is a tighter feedback loop than solo development. Mistakes get caught earlier. Dead ends get identified faster. The solution that emerges has been stress-tested in real time by two different minds approaching the problem from slightly different angles. The code is better not because two people wrote it, but because two people thought about it simultaneously.

This is why studies on pair programming consistently show that while it does take more developer hours per feature, it produces significantly fewer defects and requires less rework. The investment pays for itself. The conversation is the work.


The AI as pair programmer

Here is where the analogy earns its keep.

When most developers describe their workflow with an AI coding tool, they describe something that sounds like issuing instructions. They have a task. They describe it to the AI. The AI produces output. They evaluate the output, accept it or reject it, and move on. The human is a reviewer. The AI is a very fast junior developer who never gets tired and never takes offence.

That model works. It produces results. But it is leaving an enormous amount of value on the table.

The developers getting disproportionate results from AI tools are doing something subtly but importantly different. They are not issuing instructions — they are having a conversation. They bring the problem, not the solution. They share context before asking for code. They push back when something feels wrong, even if they cannot immediately articulate why. They ask the AI to explain its reasoning, then challenge that reasoning. They treat the session as a dialogue, not a transaction.

In XP terms, they are navigating. The AI is driving.

This is not a metaphor. The navigator role in pair programming is precisely about holding the bigger picture while the driver handles execution. It means knowing where you are trying to go, recognising when the current path is taking you somewhere else, and having the conversation that corrects course before you have written a thousand lines in the wrong direction. That is exactly what a skilled human brings to a session with an AI coding tool.

The AI, for its part, brings things that complement the navigator role almost perfectly. Breadth of knowledge across languages, frameworks, and patterns that no single human could match. Speed of execution that removes the friction between idea and implementation. Tireless willingness to explore alternatives, rewrite sections, and try a different approach without frustration or ego. And crucially — no stake in being right. An AI does not defend its previous output. Ask it to reconsider and it will.

What neither party brings alone is sufficient. The AI without a thoughtful navigator produces technically correct output that solves the wrong problem, or solves the right problem in a way that does not fit the broader system, or makes decisions that are locally sensible and globally incoherent. The human without the AI’s execution speed and breadth spends too much time in the details, loses the thread of the bigger picture, and runs out of energy before the interesting problems get solved.

Together, the feedback loop tightens in exactly the way XP pair programming described. The conversation is still the work. It has just moved to a different medium.


What the human loop actually looks like

It is easy to describe the partnership in the abstract. It is more useful to describe what it actually looks like in practice — the specific moments where the human contribution is irreplaceable, and where the temptation to disengage is strongest.

Defining the problem before touching the keyboard

The single highest-leverage thing a human brings to an AI pairing session is a clear, well-considered problem definition. Not a feature request. Not a task. A genuine understanding of what you are trying to achieve and why, what constraints matter, and what success looks like.

This sounds obvious. It is surprisingly rare. The temptation with fast AI tools is to start immediately — to get something on screen quickly and iterate from there. Sometimes that works. More often, the cost of an underspecified problem shows up three hours later when you have a working implementation of the wrong thing.

The navigator’s first job is to think before the driver starts moving. Spend time on the problem. Write it down. Share it with the AI not as a prompt but as a briefing. The quality of everything that follows is shaped by the quality of this moment.

Knowing when to push back

AI coding tools are confident. They produce output that looks authoritative, is well-structured, and compiles. This is both their greatest strength and their most significant risk. Technically correct output that solves the wrong problem is harder to catch than broken code, because nothing obviously fails.

The human navigator’s most important skill is the ability to look at plausible output and ask whether it is actually right — not syntactically, but conceptually. Does this approach fit the broader architecture? Does this abstraction hold up under the cases we haven’t discussed yet? Is this solving the problem we defined, or a simpler adjacent problem that is easier to solve?

Pushing back does not require being certain the AI is wrong. It requires being willing to have the conversation. Ask it to explain its reasoning. Ask whether there is an alternative approach. Ask what the tradeoffs are. The AI will engage with these questions genuinely, and more often than not the dialogue surfaces something important that the initial output missed.

Bringing taste the AI doesn’t have

There are decisions in software development that are not technical. They are aesthetic, strategic, or deeply contextual — shaped by knowing your users, your constraints, your history, and your values. These decisions do not have correct answers that can be derived from training data.

What belongs in version one and what gets deferred? Which abstraction is clean enough to be worth the indirection? Does this interaction pattern feel right for the people who will use it? Is this the kind of code a contributor joining the project in six months will be able to understand?

These are navigator questions. The AI can inform them, offer perspectives, and flag tradeoffs — but it cannot answer them. The human is not in the loop for these decisions because the process requires it. The human is in the loop because the human is the only one who actually knows.

Recognising when the output is wrong

This is the hardest skill to describe and the most important to develop. It is the ability to read AI-generated output and feel that something is off — before you can articulate why. Before the tests fail. Before the architecture review. Before the bug report.

It is, in essence, experience. The same pattern recognition that a senior engineer develops over years of reading code, debugging systems, and watching abstractions fail in production. AI tools do not compress this. They make it more valuable, because the volume of output has increased while the need for judgment has not decreased.

The human who can generate a working implementation in an afternoon and also recognise which parts of it will hurt them in three months is in a fundamentally different position than the one who can only do the first.

Steering, not accepting

Perhaps the simplest way to describe the human loop is this: your job is not to evaluate what the AI gives you. Your job is to steer toward what you actually need.

Evaluation is passive. Steering is active. It means coming to the session with a direction in mind, holding that direction as the work progresses, and continuously asking whether the current path is still heading the right way. It means being willing to say “this is good, but it’s not quite right” and continuing the conversation until it is. It means treating the first output as the beginning of a dialogue, not the end of one.

The developers who get the most out of AI tools are not the ones who are best at prompting. They are the ones who are best at knowing what they want — and staying in the conversation until they get it.


What this changes for independent open source development

Open source software has always had a talent distribution problem. The ideas are abundant. The developers who want to build are abundant. The time to build is not.

A motivated independent developer working evenings and weekends on a project they care about has historically been constrained not by ambition or skill, but by hours. A genuinely useful, well-architected application — something with multiple backends, proper data persistence, a thoughtful UI, comprehensive documentation, Flatpak packaging, CI/CD, and the hundred other things that separate a hobby project from something people can actually rely on — has traditionally taken years of sustained effort from a solo developer. Many projects never get there. They stall at the interesting-but-incomplete stage, maintained inconsistently, never quite reaching the quality bar that would attract users or contributors.

That constraint is loosening.

The AI pair programming model compresses the distance between idea and implementation in a way that changes the economics of independent open source development fundamentally. Not because the AI does the work — but because the conversation-driven development loop eliminates the specific kinds of friction that cause projects to stall. The boilerplate that nobody wants to write. The documentation that always gets deferred. The architectural decision that requires holding too many things in your head simultaneously. The test scaffolding that feels important but not urgent. These are exactly the tasks where AI assistance is most effective and most reliable.

What remains — and what the human navigator must still bring — is everything that cannot be generated. The decision about which problem is worth solving. The product sense that shapes a feature into something users will actually understand. The judgment that says this abstraction is clean and that one will hurt you in six months. The taste that knows what polished looks like, because you have used enough polished software to have internalised the standard.

This has an important implication. As the execution barrier drops, the bottleneck shifts. The scarce resource in open source software development is no longer time — it is judgment. Developers who bring genuine domain knowledge, strong product instincts, and the ability to recognise quality will produce disproportionate results. Developers who treat AI tools as a shortcut around thinking will produce more output, faster, with the same fundamental limitations they had before.

The other shift is in portfolio. The “one developer, one project” pattern that has characterised most independent open source work is giving way to something different. A developer with strong judgment and a productive AI partnership can now maintain multiple substantial projects simultaneously — not by spreading themselves thin, but by changing the nature of the work. The parts of software maintenance that consumed the most time — implementing well-understood features, writing documentation, managing boilerplate, scaffolding tests — are no longer the limiting factor. What remains is the interesting work. The work that required a human anyway.

For ecosystems like GNOME, where the quality bar for inclusion is genuinely high and the community of active developers is relatively small, this could be transformative. The gap between “interesting idea” and “production quality app” has historically been where most projects died. That gap is narrowing. The question is whether the developers entering the ecosystem with AI-assisted workflows bring the judgment to match the pace — and whether the ecosystem’s review and mentorship structures can scale to meet the increased volume of serious submissions that will follow.

The opportunity is real. So is the responsibility that comes with it.


The risks worth naming honestly

Any argument for a new way of working that does not acknowledge its failure modes is not a balanced argument — it is advocacy. The AI pair programming model is genuinely powerful. It is also genuinely risky, in specific ways that are worth naming clearly.

The flood of mediocre output

The same forces that allow a thoughtful developer to ship a production-quality application in days also allow a less thoughtful developer to ship something that looks like a production-quality application in days. The difference is not always visible on the surface. The code compiles. The UI renders. The README is comprehensive. The architecture document exists.

What may be missing is the judgment that shaped the decisions underneath. An abstraction that seemed clean to the AI but will not survive contact with real usage. A data model that works for the happy path and fails at the edges. A feature set that was easy to generate but does not reflect what users actually need.

Open source ecosystems that rely on community review as their quality filter — Flathub, GNOME Circle, and others — will face increased volume as the execution barrier drops. The risk is not that reviewers will be fooled by AI-generated mediocrity. Experienced reviewers are good at finding the problems underneath a polished surface. The risk is that the volume of submissions outpaces the community’s capacity to review them thoughtfully, and that the filter becomes less effective simply because it is overwhelmed.

This is not a reason to avoid AI-assisted development. It is a reason for the ecosystem to think ahead about how its quality gates scale.

Understanding what you have built

There is a specific failure mode in AI-assisted development that has no real equivalent in traditional solo development. It is possible to arrive at a working implementation without fully understanding it. The code is correct. The tests pass. The feature works. But the developer who accepted the output without interrogating it cannot explain why certain decisions were made, cannot predict how the system will behave under conditions that were not discussed, and cannot confidently modify it when requirements change.

This is not the AI’s failure. It is the navigator’s failure — a failure to stay in the conversation long enough to genuinely understand what was built and why. The fix is not to distrust AI output. It is to hold yourself to the same standard of understanding you would apply to code you wrote yourself. If you cannot explain a decision, ask until you can. If an abstraction feels opaque, explore it. The AI will not tire of the conversation. Use that.

The expertise illusion

AI tools are fluent. They produce confident, well-structured output across an enormous range of domains. This fluency can create the impression of expertise where expertise does not exist — in the AI’s output and, more dangerously, in the developer’s self-assessment.

A developer who has shipped several AI-assisted projects may have genuine expertise in the problems those projects solved — or they may have accumulated a portfolio of working code without accumulating the underlying understanding that expertise actually represents. The distinction matters when things go wrong. When a system behaves unexpectedly in production. When a security issue emerges in a dependency. When the architecture needs to change in a fundamental way. These are the moments that separate the developer who understands their system from the one who generated it.

The partnership model described in this post is specifically designed to develop genuine understanding alongside working software. The navigator who asks why, pushes back on decisions, and steers the conversation toward clarity is building expertise as they build the system. The developer who accepts output uncritically is not.

The temptation to move on

Fast tools create an appetite for speed. When you can scaffold a feature in an hour that would have taken a day, the temptation is to do ten features instead of one — to keep moving, keep building, keep generating. This is a real risk.

The parts of software development that AI does not accelerate — thinking carefully about the problem, sitting with an architectural decision before committing to it, getting feedback from real users before adding the next feature — are the parts that tend to get skipped when the rest of the loop feels fast. The navigator’s discipline is not just about what to build. It is about when to stop building and think.

Pace is a tool. Used well, it lets you reach a quality threshold faster than was previously possible. Used poorly, it lets you reach the wrong destination faster than was previously possible.


The invitation

There is a version of working with AI coding tools that is transactional. You have a task. You describe it. You evaluate the output. You move on. It works, up to a point. It will continue to work, up to a point.

There is another version that is something closer to a genuine intellectual partnership. You bring the problem, the context, the taste, and the judgment. The AI brings the breadth, the speed, and the tireless willingness to explore. Together you have a conversation — the kind of conversation that pair programming has always held up as the ideal — and the work that emerges from that conversation is better than either party could produce alone.

The shift between these two versions is not about tools. It is not about prompting techniques or context window sizes or which model you are using. It is about how you show up to the session. Whether you come with a direction or just a task. Whether you interrogate the output or accept it. Whether you are willing to stay in the conversation until you genuinely understand what has been built and why.

The XP community figured out decades ago that the most productive unit in software development is not the individual developer working alone — it is two people thinking together. That insight did not age. It just found a new form.

Stop prompting. Start partnering. The results will surprise you.