Anatomy of "Done" · a field note from the git log

Coding got solved.
Web development didn't.

At least, that's the headline going around. Closer to the truth: typing got cheap. Judgment didn't. This portfolio took 209 commits over sixteen days, every one of them written by an AI agent. Roughly half never touched anything you can see. Here is what the git history actually shows about the gap between a build that looks done and one that is, counted commit by commit.

of the code is work you can't see

Craft and production engineering: everything past the visible surface.

0 / 209

commits add no visible feature

Testing, infrastructure, security, observability, accessibility, polish.

engineering & craft disciplines on top

Each one needs enough prior experience to steer the prompt and catch where it's wrong.

The shape of the work

Less than half the codebase is the part you can look at.

Every commit was sorted into one of three tiers. The first makes the site exist. The second makes it feel polished. The third makes it survive contact with the internet. The part that "got solved" reaches the first tier convincingly, then tends to stop there. The other two are the job that's left.

43,524lines of change

It renders 46% Scaffolding, components, layout, content. The pages exist and look right. This is usually where vibe-coding stops. 102 commits

It feels right 25% Motion engineering, cross-framework parity, accessibility, SEO. The craft that separates "renders" from "polished." 39 commits

It survives production 29% Testing, CI/CD, infrastructure, security, observability. None of it visible; all of it load-bearing. 68 commits

Where the lines went

Thirteen kinds of work. Two of them are what a prompt produces.

Share of all code change by category, grouped by tier. The thing people mean by "building the site" (components, layout, content) is the top band. Notice that motion is its own discipline: in this repo it's scroll-driven reveals, view transitions, a framework-switch disintegrate/reassemble, and layout-shift-safe transforms. Engineered, not prompted.

It was never just the website

The invisible half accrues in parallel, every working day.

Cumulative commits across the sixteen days, stacked by tier. There's no cliff where features stop and engineering starts. From the first week on, the visible site hovers around half the stack and never pulls ahead. The craft and the production work aren't a phase you bolt on at the end. They grow right alongside the pages the whole way.

It renders It feels right It survives production

Read it this way: the "coding got solved" timeline would be only the bottom band. Here even that keeps growing, but it never rises above roughly half the stack. The two bands above it are everything a visitor never sees.

A twist on the visible half

And even that 46% is inflated. It's the same site, built three times.

This portfolio renders in React, Vue and Angular to prove the same interface in each. A normal site picks one. So a fifth of the entire codebase is the second and third copies of work the first framework already did, and most of it sits in the visible tier.

React 11.6%

Vue 9.6%

Angular 11%

Shared design system, content & all the engineering · 67.8%

React, the kept one Vue, a rebuild Angular, a rebuild

The Vue and Angular implementations alone are 20.6% of all the code in the repo: the same screens, built twice more.

As built · three frameworks

46%

54%

SurfaceEngineering & craft

Modelled · one framework

~40%

~60%

SurfaceEngineering & craft

Strip it to one framework and the surface drops from 46% to about 40%. The invisible half still grows, just less than a clean subtraction implies, because going single-framework also sheds real craft: cross-framework parity exists only because there are three, and a slice of the visual-regression gate and the per-framework motion goes with it. What it leaves untouched is the genuinely fixed cost, the server, the infrastructure, the security, most of the CI. Fewer frameworks doesn't shrink those; it just makes them a bigger share of what's left.

What a professional layers on top

Ten disciplines hiding under a page that "just looks like a portfolio."

Each is real and present in this repository. What matters for anyone trying to reproduce this, with a team or with agents, isn't the artifacts themselves but what each one implies: the questions you have to ask, the agent profiles you'd brief, and the test suites and pipelines you'd stand up. That's the knowledge a generated draft can't infer for you.

The long version

For the reader who wants the whole iceberg.

The same disciplines, opened up: concrete artifacts found in the repo and the prior knowledge each one assumes.

Where the craft actually lives

The forks where knowing mattered.

Underneath the code sits a trail of decisions. Each one is a place where the obvious default was the wrong answer, and you only know that if you've been bitten before. A generated draft will pick a path at every fork without pausing, and more than once here the path it first picked had to be overruled. These are a few, straight from the repository's own decision log, where the right call took prior knowledge the model couldn't supply on its own.

One honest caveat

This is only a portfolio, which is the kindest possible case for "coding got solved."

Two things are true at once, and both are worth saying plainly. Some of the invisible work here is showmanship: a brochure site doesn't strictly need shipped logs, real-user monitoring, or the same screen built three times. But a portfolio is also the easiest thing to generate, because it has almost no domain logic. The pages more or less are the product. There is nothing underneath them to get wrong.

This build

Almost all surface

The only real logic is a contact form and some content
No business rules, transactions, or money to get wrong
Nothing to corrupt, reconcile, or migrate
The 46% you can see is genuinely most of what it does

Sothe visible tier flatters the generated half. Get it looking right and you really are most of the way there.

A real application

Surface is the tip

Domain models, invariants, and business rules
Workflows and state machines whose edge cases must hold
Data integrity, consistency, migrations, integrations
Authorisation, auditability, and mistakes that cost money

Sothe part you can see shrinks, and the unforgiving part underneath grows. That's exactly where a generated draft degrades fastest.

In other words, this is the generous reading. Hold the surface constant and add real domain complexity, and the visible slice only gets thinner. The disciplines in this list don't go away. They multiply, and a new one joins them at the very top: knowing the domain well enough to be certain the logic is right. Sometimes that's you; a founder in their own field already has it. A model can teach you a documented domain, but it can't make you certain the logic is right: that takes either prior grounding or contact with the real rules, users and money it can't see. The same model that writes a broken auth check, and presents it as finished, will just as cheerfully teach you an invariant that is wrong.

Scaling the argument up

Now make it a product.

This has all been measured on a portfolio site build. A portfolio has almost no data, no users, nothing being billed, hardly any business logic that can break. Compare this build to a typical SaaS, broken down row by row in the table below. Most of this work could be omitted to get to the "works on my machine" stage.

This build skippedNo database; Git is the CMS

→

A SaaS ownsA database to model, migrate, transact, back up and restore-test

This build skippedA public site, no accounts

→

A SaaS ownsAuth, sessions, roles, and tenant isolation that must never leak

This build skippedA trivial domain: show content

→

A SaaS ownsBusiness rules, invariants and workflows that have to be correct

This build skippedOne container, replicas a non-goal

→

A SaaS ownsHorizontal scale: shared state, distributed limits, cache invalidation

This build skippedNo payments

→

A SaaS ownsBilling, metering, proration, dunning and idempotent webhooks

This build skippedNothing stored, relayed and gone

→

A SaaS ownsPII at rest: encryption, retention, audit logs, deletion on request

This build skippedOne server, provisioned by hand

→

A SaaS ownsZero-downtime migrations, staging, and disaster recovery you have tested

This build skippedA contact form as the whole attack surface

→

A SaaS ownsAn API and a tenant boundary that carry the whole company's risk

Every item on the right is invisible engineering wrapped around real domain logic. The visible UI, already under half here, becomes a thinner slice of something much larger. But by now the ratio isn't even the point.

The blast radius of a confident wrong answer.

Here, a bad default is embarrassing: a missing header, a janky transition. There, that same default, wrong but looking right, is one tenant reading another's data, a migration with no way back, or a card charged twice. Same tool. Same speed. The difference is the blast radius, and the only thing standing between the model's first plausible default and a very expensive mistake is an operator who already knows where the mistakes live.

The part that never shows up in the git log

Every commit here was written by an AI agent too.

Here's where "coding got solved" earns its asterisk. By the popular definition, this whole repository is vibe-coded: I drove Claude Code through all 209 commits and typed almost none of it by hand. So the question that matters isn't whether the machine can write code. It clearly can. The question is why this output has a content-security policy, infrastructure as code, shipped telemetry, and the same screen built three ways, when a weekend build from the very same tool has none of that. Same model. Same prompt box.

Same tool. Different operator.

It moved fast because it was pointed at problems I had already solved before, in some shape or form. I knew the disciplines existed. I knew which questions to ask, and in what order. I knew what "done" actually means here, and I could tell when an answer was confidently wrong. The model will just as cheerfully generate a version with no CSP, no rate limit, no observability, no parity, and it looks identical to the good one right up until production finds the difference.

And a good part of that judgment went in before the model wrote a line. Much of this build was spec-driven: the constraints, the taste, and the decisions I had already settled were written down first, as a specification the model then built against. Those requirements were prior experience I'd already settled, written down clearly enough for the model to follow.

What couldn't be settled up front got caught in the act. More than once the model's first answer was the wrong one, and it took knowing better to overrule it: the rate limiter that trusted a forged X-Forwarded-For header, the filter state that reached for a store it hadn't earned, the Angular toolchain default that broke the Bun-only constraint. None of that back-and-forth is in the git history. The log counts the code that survived, not the research, the prompts, or the approaches proposed and thrown away.

The steering leaves no commit behind.

Which makes every split on this page conservative. The corrections clustered on exactly the hard calls (security, state, infrastructure) and never on the visible layout, so the time that actually went into judgment tilts the real ratio even further from the part you can see than any chart here can show.

So what this page really measures is whether the operator already knows where the work is. How much of the code got typed by hand barely matters. That knowledge is what tells you which agents to brief and which tests and pipelines to stand up: the Implies column, a few sections up. Knowing what to ask for is the skill now, and experience is what gives you that.

The skill that never shows up in an audit

Cohesion doesn't come out of the box.

There's a layer of frontend work that no checklist captures and no model reliably reaches: making a thing feel like one thing. A spacing rhythm that holds from the first section to the last, a type hierarchy that never wavers, motion that's purposeful rather than sprinkled on, the hundred small decisions that make an interface feel authored instead of assembled. It's the quality you register in the first half-second and can't quite put a name to.

Take this page. It was put together fast to make a point, and if you have the eye, it shows: the spacing doesn't sit on a strict scale, the rhythm drifts from one section to the next, it's competent but not composed. That's the honest ceiling of vibe-coded design: generic, a little off, fine. Then look at the site this page is about. The cohesion you feel there didn't come from a prompt; it came from someone who sweats the half-pixels until the whole thing reads as deliberate.

That gap, between a page that just renders and one that actually feels composed, is the whole job of a frontend engineer. It's also the one thing this page doesn't have.

So no, coding didn't get solved. It got compressed.

What compressed is the visible surface, roughly the 46% you can see, and it now arrives fast. But the other half is where the domain knowledge lives: caching and CSP and rate limits and Web Vitals budgets and cross-framework parity and a deploy that can roll itself back. It's the part you can't fake, can't skip, and can't prompt your way through without knowing it exists.

Anyone with a good model gets the visible half fast now. A generated UI renders, but it lands generic and a little off, and closing the gap to something that actually feels composed is frontend work a prompt doesn't do. The invisible half I can move fast on too, because I know it's there: I point the same agents at the caching and the parity and the rollback and brief the tests before the internet finds out I skipped them. So I'm fast across the whole thing, and I know to handle the parts a prompt botches or skips.

One last thing, about you

Vibe coding moved the developer.

The part that compressed was the typing. What's left, and what now decides whether a build survives, is the judgment: which question to ask, which default is a trap, which corner is safe to cut and which one ends in a 3 a.m. incident. The developer is still needed, now as the part you can't generate.

Which makes the honest version of this a little uncomfortable. If you're a founder, a product owner, a sharp person with an idea, the tools will genuinely let you build something that runs, and for a prototype, an internal tool, or a thing you're learning on, that's a gift worth taking. But shipping it to production, to real users and real data, is a different act. You'll be making engineering decisions whether you know it or not, and the ones you get wrong don't announce themselves. They settle in as debt. Most of the time it shows up slowly: changes that used to take an afternoon start taking a week, every new feature has to work around the last shortcut, and the team spends more of its time untangling the codebase than adding to it. Sometimes it shows up all at once, as a breach, an outage, or a rewrite.

A developer takes that debt on with eyes open. This build even writes its shortcuts down as deliberate non-goals. The difference between a deliberate non-goal and a latent disaster is whether someone knew they were choosing. The code looks the same either way.

None of this is gatekeeping. Use the tools: prototype, explore, understand your own product better than you could before. Just don't mistake a believable surface for a finished system, or speed for safety.

The person best placed to vibe-code their way to production is the one who could have built it without the vibes. Everyone else needs them in the room.

Coding got solved.Web development didn't.