Essay

The AI Sandwich

AI is good at writing code. It is bad at writing the same code twice.

For the past two years I have been watching teams adopt AI coding tools and ship more services per week than they ever did. The same teams have been quietly drifting. Every new service has slightly different auth wiring, slightly different log formatting, slightly different audit trails. Each version is reasonable on its own. None of them match each other.

This is fine if you ship one service. It is a real problem if you ship fifty.

I built FixedCode because I think the fix is structural. The architecture I landed on is something I call the AI sandwich, and it is the rest of this post.

Two kinds of code

Every backend service contains two kinds of code, and they want different things from a generator.

The first is structural code: authentication, logging, persistence, audit trails, event publishing, migrations, tests for all of it. This code should be functionally identical across every service in the org. If your OrderService and your InvoiceService handle auth differently, you have a bug. Possibly a security bug.

The second is business logic: validation rules, pricing functions, integrations with whatever third-party API your industry happens to use. This code is, by definition, different in every service. That is the point.

Existing AI coding tools generate both kinds with the same machinery. You ask for “an order management service with validation and pagination” and the model writes you a service. The pricing function looks reasonable. The auth code is also reasonable. It is not the same as the auth code in the other forty-seven services your org has shipped this quarter.

That is the whole problem in one sentence.

What scaffolding got right

Deterministic scaffolding has been around for ever. Yeoman, Cookiecutter, Rails generators, Backstage, every reasonably-sized company's internal tools. Same shape every time: write some templates, parameterise them, run a generator, get a project skeleton.

What scaffolding got right is the property AI lacks: same input, same output. Review the template once and you have reviewed every service generated from it.

What scaffolding got wrong

The interface is hostile. Writing YAML to describe a service that does not exist yet is tedious, and tedium is exactly what AI is good at relieving.

Maintaining the templates is worse. Anyone who has tried to keep an org-wide cookiecutter alive for eighteen months knows it slowly turns into a second job.

And most scaffolding is fire-and-forget. You generate the project skeleton and the generator walks away. When you improve the template six months later there is no path to ship the improvement to existing services. They have forked. Backstage tried to solve this with templated software catalogues, but the experience is still substantively that you generate, you customise, and the template's connection to the output is severed.

You already trust some of this

Most engineers already trust deterministic codegen for one thing: API clients. You write an OpenAPI spec, you run openapi-generator, and you get a typed client in TypeScript or Kotlin or Python. Every team gets the same client for the same API. Nobody hand-rolls HTTP code, because the spec to client transformation is too obviously a function to do by hand.

OpenAPI works because the contract between the spec and the generated code is unambiguous, the regeneration loop is fast, and the cost of a hand-edit is so high that nobody is tempted. None of that is unique to HTTP. The same property is available for auth, persistence, event publishing, audit, and the rest of the structural layer. You just need a generator that handles them with the same discipline. That is what FixedCode is. The AI sandwich is what you do once you have one.

The sandwich

The architecture is to put deterministic generation between two layers of AI.

Top layer    (AI):       intent          ->  spec
Middle       (engine):   spec            ->  code   (deterministic)
Bottom layer (AI):       business logic  ->  extension points

Top: an AI agent translates plain English (“we need an order management service with line items and payment integration”) into a YAML spec that conforms to the org's schema.

Middle: the FixedCode engine reads the spec and generates a complete service. Same spec, same files, every time. About three seconds for a typical Spring Kotlin service.

Bottom: the engine leaves clearly-marked extension points. Validation rules, pricing functions, the parts that have to be unique to this service. AI fills those in too.

The model never touches structural code. The engine never tries to be creative.

Here is what running the middle slice looks like:

$ fixedcode generate order.yaml -o order-service
✓ Schema valid: ddd/1.0
✓ Generated 47 files in 2.3s
✓ Extension points: OrderValidator.kt, OrderScorer.kt

Human

Requirements

Spec + templates

FixedCode

Deterministic

Business logic

Human

Review + ship

Human
Requirements
AI
Spec + templates
FixedCode
Deterministic
AI
Business logic
Human
Review + ship

Why deterministic belongs in the middle

Three properties that you cannot get out of an LLM, no matter how good your prompt is.

Reviewability. Review the template once and you have reviewed every service generated from it. Review LLM output and you have reviewed only that output. At one service this distinction is invisible. At fifty it dominates the math, because no engineering org reviews fifty services per quarter at the depth that would catch the auth, logging and audit issues that matter.

Regeneration. Because the output is a function of the input, you can run the function again. When the template improves, every service inherits the improvement on the next run. Scaffolding has historically failed to deliver this, because once you hand-edit generated code to add business logic, regeneration overwrites your work.

Auditability. The contents of any generated file are derivable from the spec, the template and the engine version. “Generated from spec at hash X by engine version Y” is a much better answer for a compliance reviewer than “an LLM emitted it on the third attempt”.

The sandwich preserves all three because the middle layer is deterministic. The top and bottom layers can be as creative as you like. They produce inputs and extensions, not the structural code that has to be reviewed, regenerated and audited.

How regeneration works without losing your code

Three categories of files.

Regenerated files are owned by the engine. Overwritten on every run. You do not edit them. The repo's .fixedcode-manifest.json records every one with its hash. If you do edit one, the next run surfaces the drift and erases it.

Once files are created the first time and then never touched again. Configuration files where the engine has an opinion about the initial contents but the team will evolve them: application.yml, the root README, the Dockerfile.

Extension points are stub files the engine creates if missing and ignores if present. Business logic goes here. The engine creates OrderValidator.kt with a default implementation that does nothing useful. Developer (or model) fills in real validation rules. From then on regeneration leaves it alone.

This is the core trick. You can keep regenerating forever without losing custom work, because the engine has explicit, declared ownership of every file it touches.

The org consequence

Once structural code is generated reliably, building a service decomposes differently.

Old shape: someone with domain knowledge writes requirements, hands them to a developer, who hand-wires the service while consulting the platform team about standards. Multiple roles, multiple handoffs, multiple weeks. AI tooling can compress the developer's part of this loop by an order of magnitude. It does not remove the handoffs, and it makes the standards-consistency problem worse.

Sandwich shape: someone with domain knowledge describes the service in plain English, an AI agent translates that into a spec, the engine generates the structural code, and then the same person (or a different person; it does not matter) fills in the business logic with AI assistance.

The boundary that dissolves is between PM, developer and platform engineer. What replaces it is a boundary between domain knowledge (what should this service do, what are the rules, what is the data model) and platform knowledge (how should services be built in this org, what are the patterns, where are the seams). Both are real. Neither has to be embodied in a particular job title.

What it is not

It is not a no-code tool. The bespoke parts still get written by hand. The model can help; somebody still has to know whether the validation rule is right.

It is not free. Building a good template bundle is real engineering work, comparable to writing one good service by hand. The result then applies to every subsequent service, but the leverage is not immediate.

It is not for tiny teams. One service has no consistency problem; the whole apparatus is overhead. The wedge appears around five or ten services and a small platform team that is already overloaded. It widens from there.

It is not anti-AI. The point is to let the model do the parts where its creativity is an asset, without asking it to also be deterministic in the parts where it cannot be.

Why now

Big tech firms have been building variants of this internally for years. Google's protobuf-driven service generation. Meta's codegen chains. Amazon's Smithy. Define the service in a spec, generate the structural code, let humans focus on the bespoke parts. The model layer is recent. The shape is not.

What is different now is that the top and bottom slices can be automated. Spec authoring used to be the bottleneck that kept this approach inside companies that could afford a dedicated DSL team. AI removes that bottleneck. The deterministic engine in the middle is straightforward; most of my engineering effort has gone into making the regeneration contract bulletproof and the templates pleasant to write. It is the AI layers on either side that turn the architecture into something a normal engineering org can actually adopt.

If you have more than a handful of services and you are starting to feel the consistency drift that AI-accelerated development causes, the sandwich is the shape to consider. It does not have to be FixedCode. You can build it yourself; the docs include enough detail that doing so is reasonable. The principle is what matters: do not ask the model to be deterministic. Put a deterministic engine between two layers of model and you can have both.

The repo is at github.com/gibbon/fixedcode. npm install fixedcode. I would love feedback, especially from teams who have tried something in this shape and run into walls I have not seen yet.

Follow-up post on the implementation: How regenerating code stays out of your way (interfaces, default implementations, local overrides, and library publication).