AI Infrastructure

AI Vibe Coding CI/CD Engine

A fully autonomous CI/CD orchestration system that manages AI coding agents building production software. It coordinates multiple agents working in parallel — handling work decomposition, isolation, monitoring, failure recovery, and deployment.

The Problem

AI agents are powerful but unreliable

Anyone who has used an LLM coding agent for more than a trivial task knows the failure modes. Building production software with AI requires solving all of these simultaneously.

Context drift

As the context window fills, agents lose track of instructions given earlier. Rules stated clearly in the system prompt get deprioritized as the conversation grows.

Spinning

When an approach fails, agents try the same approach again — or oscillate between two failing approaches — consuming their entire context window.

Coordination failure

Two agents editing the same file, starting servers on the same port, running tests that interfere with each other. Without isolation, parallelism is impossible.

Silent failures

Code that passes TypeScript and unit tests can still produce a blank page. Agents can't see the app they're building.

Abandoned state

When agents crash, they leave behind running containers, orphaned worktrees, and half-finished code that blocks future work.

The Insight

Treat AI agents like unreliable distributed workers in a fault-tolerant system — the same way you'd design for unreliable network nodes or crash-prone processes.

Give them atomic units of work small enough to complete within their reliable context window. Monitor them with heartbeat checks. When they fail, clean up and retry with a fresh agent. Never let two agents edit the same file. Gate every phase with runtime verification. The build engine is, in essence, an operating system for AI workers.

How It Works

The Bead Abstraction

The atomic unit of work is a bead — typically 15-30 minutes of coding work. Each bead specifies exactly which files to touch, what acceptance criteria to meet, and what other beads must complete first. Beads are grouped into epics and organized with dependency chains.

An agent's reliability degrades as context fills. Small beads keep agents in their most reliable window. If one fails, the cost is minimal — stall it, clean up, let a fresh agent retry. Fresh context is cheaper than spinning.

PSO-01: Add packing list data model ┐

PSO-02: Build packing list UI ┤ Phase 1

PSO-03: Add shopping integration ┤ (parallel)

PSO-04: Wire up Firestore persistence ┘

PSO-GT: Smoke gate — runtime verification

PSO-05: Add sharing between travelers ┐

PSO-06: Build suggestions engine ┤ Phase 2

PSO-07: Weather-based recommendations ┘

PSO-GT2: Smoke gate — verify Phase 2

PSO-PUSH: Deploy to dev

Transactional State Management

All state lives in Firestore, accessed exclusively through a REST API. No agent ever writes to Firestore directly. This is the most important architectural decision in the system.

Early iterations let agents access state directly. They corrupted it. They skipped steps. One agent decided incrementing the attempt counter was unnecessary. Another marked its own bead complete without running tests.

The API boundary creates a hard wall. It validates every state transition, enforces invariants, detects races, and logs every action. The critical acquire operation runs as a Firestore transaction that atomically checks engine status, validates dependencies, detects file conflicts, and claims both bead and slot — or rolls back everything.

A tripwire circuit breaker monitors for excessive claim races. Three races in 60 seconds auto-pauses the engine — a signal that too many agents are competing for too few beads.

The Heartbeat Control Loop

Every 4 minutes and 45 seconds, each agent sends a health report and receives instructions back. This bidirectional communication keeps agents on track as context drifts.

Agent → Engine (uplink)

Context window remaining, files changed, TypeScript status, commit count, current stage.

Engine → Agent (downlink)

Warnings — informational. Injected rules — behavioral commands the agent must obey. Standing reminders — critical rules repeated on every heartbeat because agents forget.

Nine health checks run each cycle: context exhaustion, scope drift, uncommitted work, spinning detection, file churn, TypeScript regression, stagnation, and build duration limits.

Self-Healing Loops

Four interlocking feedback loops handle failures automatically:

Bead Retry

Failed bead → stall → re-enters queue → fresh agent retries. Most failures are context-related — a different agent with clean context often succeeds.

Smoke → Hotfix → Re-Smoke

Runtime verification fails → hotfix beads auto-created at highest priority → smoke gate re-runs. Catches blank pages, broken routes, invisible elements.

P0 Defect Gate

Critical defect logged → all feature work halted → only hotfixes proceed → defect resolved → work resumes. A global circuit breaker.

Stall Detection

No heartbeat for 30 minutes → monitor stalls bead → slot released → worktree cleaned → bead re-enters queue.

Workspace Isolation

Each agent gets an isolated git worktree with its own Docker container running Vite (frontend) and Express (backend) on unique port pairs. The slot pool has 10 positions, each with full process isolation — true parallelism without interference.

Worktrees isolate the filesystem but not processes. Docker gives each slot its own process namespace and network. Each agent can run dev servers, execute tests, and run browser automation in complete isolation.

Slot Pool

Slot 0:5100 / :3100

Slot 1:5101 / :3101

Slot 2:5102 / :3102

Slot 3:5103 / :3103

Slot 4:5104 / :3104

· · ·

Tim:5173 / :3173

Deployment Pipeline

main

Working branch. Bead merges accumulate here.

→

dev

First deployed environment. Cloud Build triggered.

→

stage

Pre-production. Tim reviews the full experience.

→

prod

Live. Requires Tim's explicit approval.

What I Learned

After 40+ epics, several patterns emerged

Fresh context solves most failures

When an agent fails, the instinct is to debug. The better strategy is to stall and let a fresh agent try. Agent failures are rarely deterministic. The retry loop has a remarkably high success rate on second attempts.

Rules must be repeated, not just stated

An instruction in the system prompt works for the first 10 minutes. By minute 30, it's been deprioritized. The heartbeat reminder system exists because "tell them once" doesn't work.

Runtime verification catches what tests miss

TypeScript and unit tests catch about 70% of issues. The remaining 30% — blank pages, broken routing, invisible elements — only appear when you run the app. Smoke gates are not optional.

The API boundary is load-bearing

Letting agents access state directly failed repeatedly. They find shortcuts, skip validations, and "improve" processes. The HTTP API is the only thing preventing agents from helpfully destroying the system.

Atomic work units are the foundation

Every feature — parallelism, fault tolerance, retry logic — depends on work being small and well-scoped. Bad decomposition cascades into every downstream problem.

The system needs to watch itself

AI agents don't know when they're stuck. External observation combined with forced behavioral change is essential. The agent doesn't decide to stop — the engine tells it to.

The Human Role

Tim is not a programmer. He's a technical leader who orchestrates AI agents. The system is designed so that a non-programmer with strong product sense can direct AI agents to build production software.

Every operational task — from running tests to deploying to production — is handled by the agents and the engine. Tim never runs terminal commands, modifies code, commits, or deploys.

Decision maker

Approves deployments, resolves ambiguity, makes business calls

Visual reviewer

Reviews UI in his own browser — he is the smoke gate

Quality gatekeeper

Feedback flows back into specs for future beads

System administrator

Configures cloud services, manages credentials

← See what it built — Smarter Travel Planner Get in touch →