From workstation buildout to AI in the loop

The second post in this series ended on the last days of 2024. The first publishing contract was over. The platform was owned outright by Wood Fired Games. I had ten months of daily AI-assisted work already on my hands, an expanded toolchain I had stood up in the last quarter, and a hypothesis about what all of it might enable. The hypothesis was that AI, used correctly, might be exactly the thing an independent studio needed to bring the larger vision into a form one person could actually ship.

The hypothesis was right. The studio I am operating today is not the studio I was operating eighteen months ago. The throughput of the studio I am operating today is not the throughput of one person.

This post is the practice arc that produced that change, told in three acts. AI as advisor, the long quiet period from early 2024 through the summer of 2025 when AI was a daily expert reference inside my IDE and the code was still entirely mine. AI as collaborator, the autumn-of-2025 transition when Claude Code arrived with the ability to actually edit my files and I had to learn how to delegate without losing control. AI as workforce, the 2026 era of agent orchestration, validation discipline, and observability that I am still inside today. The measurable outputs change by an order of magnitude across those three acts. The practice that produced them changed even more.

Act I — AI as advisor

The first paid AI subscription I bought was JetBrains AI Pro in early March 2024. It lived inside Rider, the IDE I had been working in for years. It did not have access to my filesystem. It could not edit my code. It could read whatever I pasted in and answer whatever I asked. That was the entire surface. And it was enough to change my daily practice in ways I did not appreciate at the time.

I had been a senior engineer for a long time before this. What changed was not what I could do. What changed was how cheap it became to ask questions about things I was already capable of figuring out, but couldn’t justify the time investment required. I would have spent half a day reading Docker documentation to set up a service the way I wanted it. With the assistant inside Rider, I would have an answer specific to my project in minutes, plus a reference link, plus an explanation of why that answer was right. The same compression happened with SQL schema design, with CI configurations, with a hundred other surfaces I had been competent at but slow inside. The expertise was already in my head. The activation energy to apply it had dropped sharply.

This was also where I learned what to ask AI and what not to. The dominant patterns of my prompts across this whole period were explain this concept, debug this error, review this code. Generating new code from scratch was a small minority of what I asked for. I was the one writing the code; the AI was the one I went to when I needed an expert sitting next to me. And I corrected the model whenever it hallucinated, which was constantly. No, that isn’t a property of that class. No, that method doesn’t exist. The discipline of verifying the advisor — of treating the AI as a knowledgeable colleague who is sometimes confidently wrong — became reflex.

The contrast with ChatGPT was instructive. I added a ChatGPT Plus subscription in late summer of 2024, and within a few weeks my honest assessment of it was that it still felt like a toy. I was paying for it. I was not really leaning on it. The reason was scope. JetBrains AI Pro was deliberately scoped to programming — it would refuse to answer questions that fell outside that lane, which was annoying at the time but turned out to be evidence of a product opinion I now respect. A scoped, file-attachable, code-only assistant was producing real daily value. A general-purpose chat assistant on the same dataset was producing curiosities.

The deeper lesson, the one that took me months to recognize as a lesson, was that AI’s job in this era was to compress my learning curve, not to take work off my plate. The work itself was still bottlenecked by my typing speed and my attention. What had changed was that I was reaching competence in new infrastructure tooling at maybe an order of magnitude faster than before, and I was using that compression to pick up things I had been circling for years — Docker, modern CI, Blazor, OAuth flows, persistent backend patterns — without having to break the publisher project I was actively shipping.

The asymmetry that defined my expectations heading into the second half of 2025 was about where AI’s reach actually extended. The closer the work was to server, DevOps, and SaaS — Docker configurations, CI pipelines, REST APIs, OAuth integrations, SQL schemas, Blazor — the better the AI was at it. The closer the work got to game development proper — Unity editor tools, simulation logic, ECS scaffolding, the gameplay code that actually defines the player’s experience — the more uneven the AI’s performance became. I did not yet have a framing for why. I have one now, and it is entirely a training-data story. The open-source corpus the models learned from is heavily skewed toward web and backend work. There is an enormous body of public source for the kinds of services I was now finally able to stand up confidently. There is comparatively very little public source for game development proper, because the tooling is often graphical, the engines are mostly proprietary, and the patterns the industry actually uses rarely show up in public repositories. The AI was not failing at game development because it could not reason about games. It was failing because almost no one had ever taught it how the work is actually done.

That shaped what I reached for AI to do across the rest of the year. The DevOps and platform work I had been circling for a decade — Docker, CI, OAuth flows, persistence patterns — I could now pick up at something approaching senior-engineer-with-a-mentor speed. The game-side work stayed mine. I would lean on the assistant for explanations and reviews on the game code, but I would not yet trust it to write it for me. AI could clearly do some kinds of work autonomously. The work I cared about most was not yet on that list.

In November of 2024 the practice expanded sharply in scope. I stood up a full local-AI workstation — Ollama with a 70B model, CUDA, a code-specific 32B model, two agent frameworks, Microsoft’s TinyTroupe LLM simulation library — all in one week. I installed Cursor, the filesystem-aware code editor, and pointed it at my repository. I added a Claude.ai Pro subscription. And, on the same machine in the same week, something quieter shifted inside Rider: my prompts to JetBrains AI Pro started carrying explicit file attachments. “How would I convert this file to use ASP.NET?” “Add all of the source code in this namespace to your corpus.” The discipline of deciding what context the AI sees, before the AI could see anything on its own, started to become deliberate. That practice would later acquire a name — context engineering — but in November of 2024 I was just doing it.

By the time the publisher project ended at the close of 2024, I had a year of daily practice with AI as an advisor that the surface git history does not begin to capture. The measurable output of the studio in that period looked roughly like one fast senior developer’s output. Inside, the practice was already changing in ways the next year would reveal.

Act II — AI as collaborator

The August 5, 2025 commit message in my project-viking repository reads, in full:

“just making a checkpoint before I go all vibe coder.”

I had picked up Claude Code’s research preview earlier that summer and held off using it seriously while I wrapped a publisher pivot. By early August I was ready to take it seriously. The vibe-coder line was a self-aware joke about what I was about to attempt: pointing an agentic tool at my codebase and letting it write something.

Roughly twenty-eight hours later, the first Co-Authored-By: Claude trailer in any of my projects landed. Two trailers, actually, thirty-nine seconds apart, in project-viking and wood-fired-platform. The work in question was a complete CLI command system for MECSEditor — interfaces, command registry, the full component and message and entity and query command set, JSON converters, a round-trip test project — all of it produced in one focused session by an AI that could read my codebase, plan against it, and edit my files directly. That single session shipped a deliverable I would never have been able to justify even attempting a year ago.

The thing that had changed in August 2025 was not filesystem access. Cursor had given me filesystem access since November of the previous year. The thing that had changed was autonomous multi-step execution. Claude Code could be given a goal, read the codebase to plan against it, edit a dozen files coherently in one pass, run the tests, and present me a coherent end state for review. JetBrains AI had facilitated a faster way for me to write code. Claude Code was the first tool I had used that could produce code I had not myself written, at a scope where the code was structural rather than illustrative.

The first 48 hours of that capability did not go smoothly. The new MECSEditor CLI had a systematic bug in how it generated components — most of the components I asked Claude to create came out with their data fields stripped, leaving only the required identity bytes. We worked through it across one long day. I would describe the symptom; Claude would propose a fix; I would try it; we would find a different failure mode; I would write down what I had learned and feed it back into the next prompt. By the end of the day the system was producing correct components and I had a small library of context documents — what the engine’s component contract required, how the database serialization worked, what the conventions were — that the next session could use as bounded context.

That recovery was the moment the practice I had been doing in primitive form for nine months became deliberate, daily, and intensive. The pattern was simple in description. Treat the AI like a senior engineer who has just joined the team. Give it documentation of the conventions, the constraints, the patterns, the antipatterns. Iterate on the docs when the AI gets something wrong, the way you would iterate on onboarding docs after a confusing first week of a new hire. Every mistake the AI made in those early sessions became a paragraph in a context document that prevented the same mistake from recurring. The docs were the leverage point. The AI was the thing that exposed the gaps in them. And — this is the part the industry took another six months to start saying out loud — the docs were also genuinely useful documentation for human collaborators, because the discipline of writing them honestly forced the conventions into a form anyone could read.

The first major feature I let Claude author end-to-end and reviewed at the diff level was the persistent-entity system for the platform — a hybrid persistence layer that gave the simulation a durable identity model that survived server restarts. I designed the architectural shape. I wrote the context that described the contract. I asked Claude to implement against it. I reviewed every diff. That experiment crystallized the pattern I would lean on through the rest of 2025 and into early 2026. Write the context. Let the AI propose the implementation. Review every diff.

What followed in the last two months of 2025 was the dense first output of a fully matured collaboration. The platform got its persistence layer, an OAuth-backed identity surface across Steam and Google and Apple, a Docker deployment that containerized every service with health checks, and a real CI pipeline catching regressions on every push. Four bodies of work I had been circling for a decade, all shipped in five weeks, all under the context-engineering-and-manual-review discipline that the August disaster had taught me to take seriously.

The measurable shift across this transition is the cleanest evidence I have of what changed. The pre-AI baseline — every month I worked on the wood-fired stack before the first co-author trailer landed in August 2025 — averaged about twenty-one thousand source lines of code changed per month. That is what twenty-five years of being a fast senior engineer looks like as data, and it is not a small number. The months following the Claude Code adoption began running consistently north of one hundred thousand source lines per month. Roughly five times faster, sustained, with my own review still in the loop on everything.

What had changed was not me. I was still the engineer. I had not gotten faster at typing or thinking. What had changed was the bottleneck: I was no longer rate-limited by what I could write. I was rate-limited by what I could review. That distinction is what shaped everything that came after.

Act III — AI as workforce

If your bottleneck is review, the design question becomes: how do you make the work easier to review, and how do you scale review when the rate of output keeps climbing? The 2026 answer I converged on has three layers. Redesign the engine so the language itself catches the mistakes that would otherwise show up in review. Delegate work to specialized agents with declared scopes and discipline layers, so what hits human review is already pre-validated. And instrument the entire stack — every AI call across every vendor across every tool — so what gets merged is backed by evidence rather than vibes.

The first layer was born out of frustration with how much friction AI encountered trying to follow my existing workflow. My pre-AI authoring pipeline was built for a human. Start with a gameplay concept, mentally decompose it into the messages, components, systems, and interpreters it would require, open MECSEditor’s GUI to declare each of those types into the asset database, let the tool generate the C# scaffolding the runtime would compile, then jump to the IDE to hand-write the function bodies. It worked because the human driving it carried the gameplay concept across each tool boundary and applied judgement at every stage. When I pointed Claude at the same pipeline, the work fell apart. Claude could not coherently shuttle between the GUI tool and the IDE; the handoffs that were invisible to me — concept to database row, database row to generated code, generated code to function body — were friction points the AI lost information across.

So, as any tools engineer would, I rethought the tool from the perspective of its new primary user. The answer was a fundamental inversion of the authoring direction. In the original pipeline, the asset database was the source of truth and code was generated from it. In the new pipeline, the code became the source of truth and the asset database was generated from it. Components, systems, queries, and interpreters became attribute-annotated C# types in the codebase. A Roslyn source generator read those annotations and emitted the runtime scaffolding at compile time. The asset database now sat downstream of the code rather than upstream of it, which meant Claude — working entirely in C# files — could now drive the whole pipeline from a single surface. Over a hundred hand-coded boilerplate files retired in one pass.

The compile-time diagnostics that shipped alongside the generator — sixteen of them, covering ECS constraint violations the framework can now catch before the assembly is built — were a benefit that came along for the ride. That is the part I would later describe as making the engine AI-legible. The benefit is not that AI sees prettier code. The benefit is that AI cannot quietly produce a malformed system, because the compiler refuses. Review converges on the things that require human judgement instead of catching mechanical mistakes the language could catch on its own.

The second layer was orchestration on top of the third-party GSD framework — github.com/gsd-build/get-shit-down, the work of an author other than me. I credit the substrate every time I describe what I have built on top of it. What I added to GSD was a discipline layer: specialized agents with one scope and one job each, a hook layer that enforces validation patterns at agent-boundary handoff, and an orchestration agent that dispatches work to them and presents me a coherent end state rather than a stream of half-finished intermediate outputs. The pattern is the same pattern I had built into MECS itself. Declare what each component owns. Declare what it depends on. Run them in topological order. Let the framework verify nobody stepped outside their lane.

Then March came, and the wheels nearly came off. The orchestration was producing far more code than I could review. Hallucinations slipped through. Subtle regressions accumulated. Architectural drift crept in at the edges. The structural symptom was that I kept having to revisit work the orchestrator should not have shipped in the first place. I had raised the ceiling on what I could attempt. I had not yet built the matching floor on what I let through.

That gap forced me to look honestly at what the senior practitioner’s job had become. For twenty-five years my job had been to read code. I read what other engineers wrote. I read what I had written. I read what I was about to land. Code review was the surface where the human caught what the AI — or the junior engineer, or the late-night version of myself — would otherwise have shipped wrong. With the orchestration running at the volume it was running at, that surface had stopped scaling. The code was no longer the artifact a human could realistically validate.

So I moved my attention up a layer. The artifact I now validate is the validation infrastructure itself — the test harness that proves the code works, the CI configuration that runs the harness on every push, the static-analysis rules and the cross-vendor audits that grade each other, the telemetry stack that captures every AI call and lets me query what every model did against my repository. Comprehensive automated testing went in everywhere. Every CI workflow I have today was written by an agent following a specification I wrote in plain English; I do not actually know how to configure a modern CI pipeline by hand anymore, and it would be dishonest to imply I do. The cross-vendor AI observability stack — a telemetry daemon, an outbound proxy, dashboards — captures every AI call I make across every vendor and every tool, so the question of which AI runs to trust becomes a question of evidence rather than instinct.

There is a recursion in this practice that took me a while to notice. The validation infrastructure that grades the AI-written code is itself AI-written code I never typed by hand. If trust were the right frame, that would be two leaps of faith stacked on each other. The thing that makes it tractable is that the layers grade each other. Tests describe behavior. Behavior is independent fact. The failure of a test is true regardless of who wrote either side. The CI runs the tests. A cross-vendor audit grades the CI. I read the verdicts, not the diffs. Trust is a feeling. Validation engineering is a system. The first scales with attention; the second scales with infrastructure. Over the course of this spring I migrated almost entirely from the first to the second.

The measurable output across this period is the part of the story that still surprises me when I look at it. Across the seventy-five months of source-control history I have, I have shipped roughly 1.96 million source lines of code. Of that, 44.8% carries an AI co-author trailer — and every line of that 44.8% was written in the last ten months. The last ten months shipped more source code than the entire five and a half years before them combined. The peak month, March of 2026, was 309,000 lines, and that month was a net deletion in raw bytes because the work was a source-generator migration replacing hand-coded boilerplate with generated equivalents. The volume of work the studio is doing is no longer one person’s volume. It has not been one person’s volume for ten months.

I want to put that number in its place before I say anything else about it. Lines of code is a vanity metric, and every senior engineer in this industry knows it. The number above is the easiest measurement to make, not the most meaningful one. Watched as change-over-time it is not entirely useless — it tells you whether the studio is moving — but it tells you very little about whether the movement is in the right direction. What actually matters is the shipped value: tickets closed and verified, features that land and stick, bugs that stay fixed, audits that come back clean. The reason I built the cross-vendor observability stack alongside the task-orchestration system I am preparing to release is to measure exactly that. The tasks system tracks the work I want done. The telemetry tracks the AI doing it. Together they let me ask the question LOC cannot answer: of the work the AI produced this week, how much survived validation and shipped, and how much got reverted, rewritten, or quietly broke something downstream? That is the question that matters, and the infrastructure to answer it at scale is what I am still building.

Even taken at face value, the number above does not mean I have become five times the engineer I was. I am the same engineer. What has changed is where my attention lives in the loop. The code is now produced and validated downstream of me. I write specifications. I review test coverage, CI configurations, audit findings, telemetry trends. I look at what the validators are catching, what they are missing, and whether the gap is widening or closing. The orchestration is doing the typing. The validation infrastructure is doing the checking. I am doing the architectural decisions, the design judgement, and the review of the validators themselves — the parts of the job that, in retrospect, were always what actually mattered.

The artifacts of that practice are starting to leave the studio. The next step is the open-source release of wood-fired-tasks — the externalized version of the orchestration discipline I run on my own work, with the task graph, the verifier and integration-auditor agents, and the validation hooks at agent-boundary handoff all packaged up for general use. The observability stack is the externalized version of the telemetry I depend on internally. Underneath both is the Wood Fired Agent Operations Platform, the productized governance layer this whole practice has been pointing at. The work I do for my own studio is the same work other organizations need. A discipline that an independent studio of one person can run is a discipline a team can adopt. The journey is the resume.

Where this leaves me

Two and a half years ago I bought my first AI subscription because I found value in having an expert sitting inside my IDE. Ten months ago an agentic tool first produced code in my repository that I had not myself written. The studio I am operating today, with AI in the loop on every line of every project, is producing output that would have taken a team of dozens to produce twenty years ago. The bottleneck has moved from writing, to reviewing, to governing — to building the engineering substrate that lets the AI work and the human catch what matters.

The vision the studio has been quietly aiming at since the 2020 thesis — an independent operation shipping the kind of multiplayer technology that historically required a team of dozens — is no longer aspirational. It is the practice I am running today. Both engines had to be running. Anthropic kept shipping more capable models. I kept finding harder problems to point them at, and harder questions to ask of how I was directing them. The compound result is something I can now hand to someone else.

This is the third post in a series. The first covered the six years and twenty-three years of thinking that produced MECS as an engine. The second covered the platform that grew up around the engine and made it operable as a studio. This third post covered the practice that turned the studio into something I can operate at a throughput one person should not be able to sustain. The series ends here for now. The work continues every day.

From workstation buildout to AI in the loop.

Act I — AI as advisor

Act II — AI as collaborator

Act III — AI as workforce

Where this leaves me

Introducing wood-fired-tasks →