Draft Foundry — multi-tenant content product with a governed multi-agent operating model

Overview

I designed and built Draft Foundry, a multi-tenant content product for B2B hospitality.

The interesting part of this project is not that it generates content. Plenty of tools do. The interesting part is the discipline around it: a locked object-model that survives UI rewording, a verticalization seam that names what already exists instead of abstracting prematurely, hard isolation between brands and properties, explicit boundaries around images and publication, and a governed multi-agent operating system I use to ship work through structured handoffs.

It is a commercial product at pilot stage. A marketing agency is actively building the sales pipeline. The work I am describing here is not a tutorial-grade side project; it is the architecture and the operating model of a real product I am running.

The case study is about how I built it, what I refused to build, and the rules I used to keep a fast-moving codebase from turning into vibes.

The problem

The naive version of this product is easy to describe and dangerous to ship.

It looks like this: a generic AI content tool, vaguely multi-tenant, with a single projects table, a single documents table, a few LLM prompts behind a button, and a UI that calls the same object three different names depending on which screen you are on. It demos well for ten minutes and rots within a quarter.

The real product had different requirements:

It had to be multi-tenant with hard isolation, where brand-level material did not leak into sibling properties.
It had to support a specific vertical (hospitality) without quietly assuming hospitality forever, because real estate was already on the roadmap.
It had to keep UI vocabulary (Draft, Brief, New attempt) and internal vocabulary (Artifact, Run, Revision) distinct without letting either drift.
It had to handle images as a real domain concern with originals, derivatives, suitability, and assignment kept separate, not as a vague "media" blob.
It had to handle publication without overclaiming. A package being prepared for download is not the same thing as having been delivered, and the system had to refuse to call one the other.
It had to be shipped by me, mostly, on a finite budget of attention, while I held a consulting engagement and a job hunt in parallel.

That last constraint is the one that shaped the operating model. I needed a way to move fast on the parts I owned and stay disciplined on the parts that would punish me later. That is where the multi-agent system came in.

Why this was hard / constraints

This is the kind of product where the gap between "demo" and "trustworthy" is enormous, and where the difficulty is almost entirely in the seams.

A few constraints made it harder:

The codebase started in a different shape as a B2C article generator on my own domain, and had to be pivoted to B2B hospitality without leaving B2C residue behind. Half-removed models are worse than fully removed ones.
The object-model had to survive marketing-driven rewording. UI copy moves more often than schema; if I let the UI rename a thing, the database should not have to follow.
Tenant isolation was not just a row-level check. Brands and properties are first-class entities with their own materials, and single-property brands auto-manage one primary property, which creates a subtle ownership tuple that is easy to corrupt.
Images are a permanent source of architectural decay if you let them be one blob with metadata bolted on. They needed real boundaries from day one.
Publication is the part of the product where overclaim is most tempting. Buyers want to hear "published." The system has to refuse to say it until it is true.
The product had to be shipped by one engineer with subagent help, which meant the operating model around the code had to be at least as disciplined as the code itself.

The hard part was not "build an AI content tool." It was building a tool whose conceptual layers stayed honest while the surface evolved.

Architecture decisions

The main architectural commitments are the rules I refuse to break, even when a feature would be easier without them.

1. The object-model law

I locked an object-model law and wrote it down as a rule that auto-loads into every coding session.

Artifact is the canonical domain noun.
Draft is the primary UI noun.
Brief is the intake.
Revision is the same-draft evolution.
New attempt is a branched or cloned draft from a chosen revision, not a retry.
Execution and Run are narrow internal machinery, never user-facing.

User-facing semantic renames precede physical renames. The runs and contexts tables stay physical even when the UI calls them "Content brief" and "Source material." That discipline kills rename theater, where a team renames the same thing five times across a quarter and ships nothing.

2. Verticalization seam, not vertical abstraction

The product is hospitality-first, but real estate is on the roadmap. The naive move is to abstract early and ship a Vertical interface with three implementations and one real customer. I refused that.

What I shipped instead is lib/verticals/ as a thin manifest registry — four files, no new logic, just naming what already existed in the HOSPITALITY_* constants. The seam is the lens for future vertical-2 work, not a pre-built abstraction. When real estate becomes real, the seam is where the work goes; until then, the seam costs me nothing.

This is the "hospitality wedge first, empire later" rule made structural. The architecture refuses to let a future AI-first CMS vision hijack the present wedge.

3. Brand and Property as first-class domain entities

Brand and Property are not generic projects. They are domain entities with real ownership semantics.

Single-property brands auto-create and manage one primary property, so the brand and property concepts collapse cleanly for small operators.
Brand-level materials persist with propertyId = null; property-level materials are scoped tightly and must not leak across siblings.
The ownership tuple is coherent: a record never mixes a source brandId with a freshly resolved propertyId from a different brand.

This sounds obvious until you watch a generic multi-tenant CMS try to retrofit it.

4. Image workflow boundaries (locked v1)

Images degrade architecture faster than almost anything else, so I locked the v1 boundaries explicitly:

Immutable originals.
Persisted validation.
Preset crop derivatives.
Explicit slot assignment.

Four truths kept separate: storage, metadata, suitability, assignment. Each lives in its own concern, none of them are allowed to absorb the others.

What I refused to build into v1: a full DAM, an approval engine, an image editing suite, default image generation, a cross-run reusable image library, semantic edits. Those are real products, and they are not this product.

5. Publication seams that refuse to overclaim

Publication is where AI-content products usually start lying.

I modeled it as two separate concerns: publication_packages is the package-build truth, and publication_delivery_attempts is a bounded handoff seam. The first delivery surface is download_route only. The first honest status semantics are prepared and failed.

The system refuses to call download preparation "published," "downloaded," or "delivered" unless it can prove the claim. That is the kind of architectural honesty that matters when a buyer asks for an audit trail.

6. Content-brief UX direction (preview-first)

The default surface shows user progress; advanced surfaces show system mechanics. The information architecture is a triangle: Preview is the default reviewable artifact, Compare is for deliberate change understanding, Advanced is for operator and system mechanics.

Preview is not allowed to become a kitchen-sink control surface. That rule is in the auto-loaded rules; it has stopped at least three good-looking, wrong features from landing on the wrong screen.

What I built

I implemented Draft Foundry across application code, schema, infrastructure, and an operating model around how the work gets done.

Platform and application

Next.js 16 (App Router) with React 19 and TypeScript across the stack.
PostgreSQL 17 with Drizzle ORM and Drizzle Kit for schema and migrations.
Auth.js v5 with magic-link auth via Resend, scoped tenancy via brand and property hierarchy.
S3-compatible storage with DigitalOcean Spaces in production and a generic S3 in development.
Sharp for image processing, bounded by the image workflow boundaries above.
Playwright e2e suite, recently repaired from six red to six green and held there by an explicit e2e discipline rule.
Docker Compose for local development, intentionally minimal.
DigitalOcean Droplet + Managed Postgres + Spaces + Caddy in production. Boring infrastructure. The interesting work is the domain model and the operating system around it.

Domain model and seams

The object-model law expressed in schema and enforced through naming discipline across loaders, components, and tests.
The verticalization seam as a manifest registry under lib/verticals/.
Brand-Property hierarchy with isolation rules baked into queries and material scoping.
Image storage, validation, derivative, and assignment as four separate concerns.
Publication packages and delivery attempts as separate tables with honest status semantics.

The pivot, done as engineering

The B2C → B2B pivot was not handwaved. It happened in a numbered batch sequence (P1 through P4) of commit-traceable changes: orphan file deletions, a hand-written schema migration when drizzle-kit broke on a non-trivial drop, component renames matching the object-model law, and a final pass that left the e2e suite green.

That kind of cleanup is the test of whether an engineer believes their own architecture. The credible move is to delete the dead model rather than leave it tagged as legacy and slowly poison the codebase.

Public surface

Bilingual EN/IT public homepage shipped to / (commit 6426b7d).
A pilot-stage commercial product surface behind it.

I am being deliberate about the framing here. The homepage shipped. The pilot is live. It is not at scale. Saying anything else would be overclaim, and overclaim is what makes case studies untrustworthy.

The multi-agent operating system

This is the part that needs the most precision, because the easiest way to misdescribe it is to make it sound like autonomy. It is not.

I built and operate a governed multi-agent operating system, in Claude Code, that I use to ship work on Draft Foundry. The architecture is genuinely diagrammed in three Excalidraw artifacts I keep with the project.

Roles

Nemo is the main session, the coordinator. Product thinking, scope pressure, synthesis. Routes work to specialists.
Jinx is the engineering execution specialist. Debug, refactor, migrations, tests, deploy reliability.
Vi is the product and UX systems specialist. Screen critique, flow clarity, information architecture, hierarchy.
Escriba is the editorial specialist. Copywriting, rewrites, editorial judgment. Recruiter-facing prose, hospitality content.

Each specialist has its own working style, its own red lines, and its own rules file. They are not interchangeable. The coordinator does not steal lanes.

Gated batch protocol

The protocol is the same every time, and it is the part that makes the system trustworthy:

Approved batch of work, scoped explicitly.
Implementation by the specialist.
Agreed verification (build, lint, e2e, manual smoke depending on the batch).
Separate commit.
Review packet reported back: checks run, commit hash, files touched, summary, risks.
Stop. Wait for greenlight before starting the next batch.

Destructive operations, data-loss or migration ambiguity, external actions, or product calls the coordinator cannot safely make are escalated to me by name.

File-based memory and auto-loaded rules

The system has a persistent file-based memory under ~/.claude/projects/-home-tp-coder/memory/ with MEMORY.md as the index. Specialists share read access. Memory persists across sessions. I do not rely on context windows for continuity.

Rules auto-load via glob patterns from .claude/rules/*.md. The object-model law, schema-migration discipline, batch protocol, image workflow, content-brief UX direction, hospitality content style, deploy notes, and verticalization framing are all rules. They load themselves into every relevant session without me having to remember to paste them.

That is the part that makes this an operating system rather than a prompt collection. The discipline is structural, not aspirational.

What it is not

It is not autonomous agents coordinating themselves and shipping production code. It is a coordinator-plus-specialists pattern with explicit human approval at every batch boundary, structured handoffs, file-based memory, and auditable commits.

I built this because it is the way I want to work with LLMs, not because it is a buzzword. The discipline is the product.

Reliability and operational concerns

A multi-tenant content product earns or loses trust on a small number of operational concerns, and I treat them as first-class.

Tenant and property isolation

Brand-level and property-level materials are scoped at query time, with propertyId = null reserved for brand-level scope. Property isolation is verified in tests, not just asserted in comments. The ownership tuple is enforced at write time, not patched at read time.

Schema migration discipline

Drizzle Kit is the default migration tool, but when it cannot safely express a drop or a non-trivial change, I write the migration by hand. The B2C → B2B cleanup included exactly that case. The rule is that the migration is the contract; we do not let the ORM write contracts it does not understand.

E2e discipline

The Playwright suite is held green, and there is an explicit rule that a red e2e is a stop condition for the batch protocol. The suite went from six red to six green in a single repair pass, and the rule prevents quiet regression.

Image safety

Originals are immutable. Validation is persisted. Derivatives are preset and known. Assignment is explicit. The product does not silently overwrite an original, does not silently regenerate derivatives, and does not silently reassign a slot. Each of those four operations has a name and a code path.

Publication honesty

Status semantics are deliberately narrow. prepared means a package exists and can be downloaded. failed means it could not be prepared. The product does not invent a "published" state until there is a delivery surface that can prove it.

Boring infrastructure

DigitalOcean Droplet, Managed Postgres, Spaces, Caddy. There is no Kubernetes, no service mesh, no message broker, no microservices. That is a design decision, not an oversight. The interesting work is the domain model and the operating system around it; the infrastructure should be the part nobody has to think about on a Tuesday morning.

Why this design was maintainable

The design is maintainable because the rules are written down and the operating system enforces them.

A few things matter here:

The object-model law is auto-loaded into every session. It is not in my head; it is in the codebase as a rule file.
The verticalization seam is shaped to absorb the next vertical without rewriting the first. When real estate becomes real, the work goes through the seam.
The image and publication boundaries are explicit refuse-lists, not just allow-lists. I know what v1 will not become.
The batch protocol forces every change to come with verification, a commit, and a review packet. There is no "I'll clean it up later."
File-based memory means session amnesia does not erase the work. New sessions resume with the relevant rules and the relevant memory.
Specialists do not steal lanes. Engineering, product, and editorial decisions go to the right specialist with the right red lines.

The result is a codebase that I can leave for two weeks and come back to without surprises. That is what maintainability looks like in practice.

Key takeaways

This is the kind of project that makes the most sense when you look at the rules, not just the features.

A few takeaways stand out:

An object-model law is more valuable than any feature it gates.
Verticalization should be named, not built. Naming what already exists is cheap; abstracting for a vertical you have not sold yet is expensive.
Image and publication boundaries are where AI-content products silently rot. Both deserve explicit refuse-lists from day one.
Multi-agent work is only useful when it is governed. The coordinator, the specialists, the batch protocol, the memory, and the rules are the discipline. Without them, multi-agent is just unsupervised LLMs in a trench coat.
Pivots are an engineering test. The credible move is to delete the dead model and keep the suite green, not to leave it tagged as legacy and forget about it.

The most important result is not that Draft Foundry generates content.

It is that Draft Foundry is a system I can defend, line by line, when a senior engineer asks me why I did it that way.

Final note

This was not an AI demo and it is not an autonomous-agent product.

It is a multi-tenant content product for B2B hospitality, built behind a governed multi-agent operating system, with a hard object-model law, a verticalization seam, image and publication boundaries that refuse to overclaim, and a batch protocol that keeps the codebase honest.

That is the kind of engineering work I want to be judged on: not a clever LLM trick, but a real product with a real domain model and a real operating discipline around how it gets shipped.