What happens when you point Cowork at a D&D campaign

Three sessions into our second arc, I gave up trying to remember anything. The campaign is fortnightly, the cast has rotated four times, and the world is dense enough that any given evening references some Aurelian magistrate from session six. The DM does the work of remembering. The players are the comic relief in our own story.

So I started recording. Audio, Whisper-transcribed, dropped into a NotebookLM as we went. Who is the dwarf with the goggles again? would return a confident two-paragraph summary citing four sources. Useful. Not enough.

The problem with the notebook was that nothing settled. Every question was a fresh interrogation of the same dense pile. The DM had made a world worth treating as canon, and the notebook had no canon, just a corpus. There was nowhere to look something up. There was nowhere to link something to.

I have been calling this kind of moment the gravity gap. A document corpus is a place to ask questions. A wiki is a place to make claims. The question is what it takes to cross the distance between the two, and over the weekend of 9 May 2026, I tried to find out.

The result is Nickipedia, a 242-entity Wikipedia-style fan wiki for a D&D campaign nobody outside the table has any reason to care about. It is, by a comfortable margin, the most over-engineered thing I have built this year, and the one that has taught me the most about what tools like Cowork can actually do when you let them have a long weekend with a real project.

What it is

Nickipedia is a Hugo static site with a custom Wikipedia Vector skin, deployed via Firebase Hosting. It currently spans 242 entities (NPCs, locations, items, sessions, factions, deities, magic items, open mysteries, party members) across 17 session recaps, with a calendar, a search index, an open-threads tracker, and per-entity portraits where the content earns one.

Every page is built from a markdown source file in a folder structure that maps the campaign world: 04_npcs/, 05_locations/, 07_items/, 08_quests/, 09_mysteries/, and so on. Each file has YAML frontmatter (status, aliases, first_seen, last_seen, sources with line-numbered transcript citations), a wiki_summary block that renders as the page lede, and a body in Obsidian-style [[wikilink]] syntax.

A typical entity file (04_npcs/brinemaw-wyrm.md, written for a creature introduced in session 17):

---
type: npc
name: Brinemaw Wyrm
aliases: [Brinemaw, the big serpent dragon thing, the giant worm]
slug: brinemaw-wyrm
portrait: /images/characters/brinemaw-wyrm.jpg
categories:
  - Monsters
  - Sapphire Sea fauna
  - Frost-zone creatures
wiki_status: stub
first_seen: s17
last_seen: s17 (ongoing)
status: alive (combat ongoing)
tags: [monstrosity, wyrm, large, aquatic, frost-breath, homebrew]
sources:
  - session: s17
    line: 1616-1617
    note: Hugo identifies attack target as "the big serpent, dragon, thing"; DM as "the giant worm"
  - session: s17
    line: 1619-1638
    note: Frost-breath line cone, 32 cold full / 16 half on Dex save
  - session: s17
    line: 1697-1711
    note: Mike's 2nd-level Phantasmal Force lava-cube; Wyrm fails Int save (10), takes 2d8 psychic
related:
  - "[[Reef Stalker]]"
  - "[[Rimeclaw Skulker]]"
  - "[[Water wolves]]"
---

Every claim in the body has to map back to a sources: entry, and every sources: entry is a line range in a normalised transcript that the validator can verify exists.

The build script walks the markdown, resolves the wikilinks against a canonical name + alias index, renders citation hover-previews from the actual transcript text, harvests every blockquote and validates it against the cited line, and emits a Hugo content tree the static-site generator turns into HTML. The deploy script wraps that with a Firebase token, runs Hugo to a non-Drive cache directory (the user’s Google Drive sync helpfully treats /people/Ruin/ and /people/ruin/ as the same folder, which is its own short essay), and pushes 2,267 files in one shot.

The progression

It started as a NotebookLM. Drop the transcripts in, ask questions, trust the citations.

By the end of the first afternoon I’d switched contexts. Build me a wiki for this campaign turned out to be a one-line prompt that produced about eighty per cent of what I needed in maybe four hours: a Hugo site with templated NPC and location pages, a search index, a deploy script, the look-and-feel of a fan wiki. The compression in that first four hours was extraordinary. The kind of thing a year ago I would have abandoned three days in, because the template work alone would have outlasted my patience.

Then I hit the 80:20 pattern. The remaining twenty per cent was the bit that mattered: citation integrity, canonical-name management, the fortnightly maintenance burden, the part where a real DM reads the result and notices the small lies. Every piece of that took longer than the entire scaffold.

The four phases of that twenty per cent, roughly in order:

Citation discipline. Two validators, three layers. A strict in-range check that fails the build if any (s17 L1937) citation points past the end of its normalised transcript. A blockquote validator that lifts every > quote and confirms the exact text appears at the cited line. And a snippet-vs-claim heuristic that catches inferred quotes pretending to be verbatim ones. The corpus currently runs 162 blockquote bindings with zero failures.
The multi-pass extraction pipeline. When a new session lands, the single-read approach (read the transcript once, write the recap) misses material the way a single read of anything misses material. The replacement dispatches six parallel agents, each scoped to one extraction goal, each grep-verifying its own citations. A seventh sequential pass cross-checks the merged draft. The synthesis pass writes the recap from the union of their outputs, not from the transcript directly.

The specification each agent is briefed against, in the order the orchestrator dispatches them:

## The passes

Seven extraction passes, run in parallel (no inter-dependencies), each
producing a structured `_discovery/_extract/session_NN_<pass>.md` file.

### Pass 1: NPCs (new + updated)
Every named or notable figure who appears, even briefly. Distinguish
new / returning / unnamed-but-significant. Each entry carries at least
one line-cited supporting quote.

### Pass 2: Locations
Every place named, visited, described, or referenced. Distinguish new /
re-visited (with what changed) / referenced-only. Flag any new directional
claim because directions get misremembered between sessions.

### Pass 3: Items, loot, currency, equipment changes
Every item that changes hands, is identified, attuned, lost, broken, or
used in a story-significant way. Always check return/trade scenes before
logging an item as lost.

### Pass 4: Combats and encounters
Trigger, combatants, notable rolls, damage per participant, status
effects, resolution. Distinguish creature types; do not let pronouns
like "the big one" conflate two enemies.

### Pass 5: Dialogue and quotable moments
Verbatim quotes worth preserving for character continuity, decision
points, threats, reveals, vows, callbacks. One quote per entry, with
the exact line number and one-line context.

### Pass 6: World-state, factions, mysteries, time
Off-PC changes. Faction movements, mysteries opened or closed, time
elapsed in-fiction. Flag OOC speculation vs DM-stated fact.

### Pass 7: Citation cross-check
Run sequentially after passes 1-6 merge into a draft recap. Grep-verify
every (sNN LMMM) citation and every blockquote against the actual
transcript. Flag any name, place, or item that appears with multiple
spellings.

The point of the architecture is that six independent reads of the same text catch things one read does not. Session 17’s run produced 11 new entity stubs, 15 entity updates, 9 derived-page updates, and caught three places where the first read would have published nonsense: a contract that wasn’t actually signed but looked signed, a piece of player table-speculation written up as DM canon, and a rescued NPC family I’d recorded as boarding the boat when in fact they sailed away on their own.

The scheduled task. Cowork’s scheduled tasks let you cron a prompt. Every Friday at 6am, check the transcripts folder for new files. If there’s a new normalised transcript, run the per-session pipeline against it. That task knows the multi-pass model, the citation validators, the canonical-name register, the 14-step derived-page sweep, and the deploy step. It owns the contract; I own the corpus.
The DM corrections register. This was the most instructive bit, and the bit that taught me the most. I’ll come back to it.

The Cowork part

Cowork mode runs Claude with file tools, a sandboxed shell, and a workspace folder it can edit on your computer. It is, mechanically, Claude Code wearing different shoes. What it adds, and what made this project possible at the scale it ended up at, are two things: the ability to dispatch parallel agents from inside a session and merge their outputs, and the ability to schedule a prompt to run later, automatically, against a folder of files I would otherwise have to remember to update.

The parallel-agent dispatch is the load-bearing one. For Nickipedia, almost every non-trivial task is structured as one orchestrator + N parallel workers. The s17 pipeline runs six agents in parallel for extraction, then sweeps the renames in a seventh, then dispatches three more in parallel for the player-safe softenings, identity fixes, and removals. The big DM corrections wave (which I’ll get to) processed 70 distinct rulings across 4 phases with 8 entity renames, ~150 wikilink updates, and a sweeping softening pass on ~30 files, in roughly an hour, by partitioning the work between three agents working at the same time on non-overlapping topics.

The scheduled-task bit is the load-bearing-for-the-future one. The campaign keeps producing transcripts. I am not going to want to run the pipeline manually every fortnight, forever. Cowork doesn’t ask me to. The task is configured once; the wiki maintains itself after each session, with citation validators as the deploy gate, and the only thing I have to do is read the diff and approve the push.

The task’s actual brief, the thing Cowork sees when it wakes up to run, is short by intent. The prompt is a few hundred words. The contract it reads is a few thousand. Here is what the task tells itself to read first:

# Read these first, in order

1. `00_system/HANDOFF.md` - entry point. Lists what's live now and the hard
   guardrails.
2. `00_system/maintenance_contract.md` - what to maintain at what cadence;
   the hard guardrails section is binding.
3. `00_system/wiki_pipeline.md` - the eight-step operational runbook.
4. `00_system/scheduled_task_contract.md` - what an unattended run can and
   cannot do (read this carefully; it scopes you).
5. `00_system/SCRIPTS.md` - every script in the project; consult this
   before writing new tooling.
6. `00_system/transcript_extraction_passes.md` - the multi-pass extraction
   architecture you must follow.
7. `00_system/recap_prompt.md` - recap format + citation-line-sanctity rule.
8. `00_system/changelog.md` - last 3 entries to understand the current state.

Then it gets the canonical-names register (Sevryn not Severin, Eldwythe not Eldwife, Project Veilbreak not Project Whale Break, Vedalken not Veldekin, the long list), then the eight-step pipeline (detect new transcript → normalise → dispatch six parallel extraction agents → synthesise recap → cross-check citations → update entities → stub-fill redlinks → drift sweep → portraits → build → deploy → verify → changelog), then the acceptance criteria.

It is the closest I have come to building a system that behaves like an institution. The wiki has a maintenance contract. The contract is documented in markdown the agent reads on every run. New conventions get added to the contract, not asserted in the prompt. The prompt is small; the contract is large; the agent reads the contract.

That is the scaffold problem, dressed up in a specific shape. The thing that makes prompts useful at this scale is not the prompt, it’s the structure around it that survives between sessions. The contract files are the scaffold. The validators are the probe. The two together are what let me trust a one-line run the pipeline invocation against a corpus that compounds.

The DM correction wave

Once the wiki was live, I sent it to the DM with a deliberately leading question: what does it get wrong? He replied with a 70-item list.

Some were Whisper artefacts I should have caught: the race I’d canonised as Veldekin across six transcript spellings (Valdekin, Vidalcan, Vildulcan, Vildalcan, gudolkin, Vidolkin) is actually Vedalken, a published 5e race. Dr Phlegm Joe Singe (which I had flagged for what felt like a year as transcript-rendered, DM-unconfirmed) was Dr Flem Josenge. Wayne Quarter in Umbrafall is Wane Quarter. The deity the All-Father is the All Hammer. Hugo’s pre-experiment name is Corvin, not Colvin.

Some were combat attributions the wiki had inferred from context that the VTT had shown differently. The s3 dragon fight kill I’d attributed to the boss was actually delivered by a yuan-ti accomplice during the same encounter. Maelis Dirn (who I’d had killed by a wraith) was killed by the Titan, after a PC freed it. Cholmondeley wasn’t a Paladin of Bahamut, he was Pure Light-aligned, which is a separate religious system the wiki had quietly fused.

Some were table jokes I’d canonised. “Kevin? Beetlejuice?”, which I had recorded as alternate names for the character Vasquez, were the Black Mask deliberately mangling his name out of contempt. Bronze God, which I’d had as Vasquez’s in-fiction title, was OOC table chatter. The Marut, which I had on the story-spine as the campaign’s most dangerous off-screen antagonist, was a joke that never appeared in play. I had to delete him.

And some were player-safe softenings the DM specifically asked for: the wiki had been a little too confident about who the Black Mask is really loyal to, what the Wraith’s actual role is in the project that made Hugo, where the Egg currently is. Those needed reframing in the voice of what the party actually knows, not what I had inferred.

I tracked the full register in a single markdown file (numbered to match the original question, the DM’s ruling, the one-line wiki action). The corrections were applied in four phases, each phase dispatched as a parallel agent: Phase A for renames (sequential within itself, because file moves and wikilink updates conflict), Phase B for identity and combat fixes, Phase C for player-safe softenings, Phase D for removals and new canon. Final validation showed citations all in-range across 242 entities, 162 of 162 quote bindings passing, with only two pre-existing benign alias collisions remaining.

The whole correction wave took about an hour of wall time and produced a 60-line changelog entry documenting what changed and why, plus five new rules added to the maintenance contract for future passes (the shapeshifter alias trap, the dialogue-extract leakage rule, the recurring-antagonist threshold, the Whisper-variants meta-rule, the renamed-entity blockquote handling rule). Each rule was earned the hard way. Each one is now load-bearing for the next pipeline run.

What this all says about the compression gap

The wiki is a toy. I don’t have a research claim about D&D. There is no cohort to study. No grant funded this. The only audience is six people, two of whom are also at the table.

But what the project is doing, mechanically, is the compression gap in miniature. It is the thing I keep writing about, scoped down to a single weekend’s effort.

Four hours to a working wiki. Then forty hours of the part that doesn’t compress: citation discipline, name canonicalisation, the multi-pass extraction model so future-me doesn’t trust a single-read summary, the scheduled task so the maintenance burden becomes someone else’s problem, the corrections register so DM rulings are durable across sessions rather than re-litigated each fortnight. A Minimum Viable Literacy not for the user, but for the agent maintaining the corpus on my behalf, encoded as markdown files the agent reads on every run.

Cowork made this possible because Cowork shipped the two pieces (parallel agent dispatch, scheduled tasks) that the project needed at exactly the points it needed them. The same project a year ago dies at hour four. Six months ago it might have got to a static site, but the citation validators wouldn’t have existed and the DM correction wave would have taken a working week instead of an hour. The compression is real. The scaffold around the compression is what made it durable.

Honest limits

The wiki is over-engineered for what it is. Building a citation validator into the deploy gate for a fan wiki nobody will lawyer is, on any sensible reading, a misuse of a weekend. The corrections register is more elaborate than the canon it corrects. Every safety net is itself a thing to maintain, and the maintenance has its own quiet failure modes: at one point the calendar silently dropped a whole session because a single character in the frontmatter parsed wrong and the build script kept going. The validators caught the citation regressions; they did not catch the build that succeeded while emitting incomplete data. That is the next class of probe to build.

The lesson there is itself a small instance of the gap. Cowork compresses the build. It does not compress the care. The validators that caught the citation regressions also caught my own laziness; the corrections register that lets the DM keep me honest also raises the surface area of things that can go wrong; the multi-pass extraction model that catches what single-read misses also produces 30,000 words of intermediate _discovery/_extract/ that nobody but me will ever read. Every safety net is a thing to maintain.

I would do it again. I would do it again because the cost of doing it was a weekend and the cost of not doing it is the slow death of campaign memory across a multi-year game, and because every component of the system is transferable. The multi-pass extraction model works for any structured transcript corpus. The citation validators work for any wiki of cited facts. The corrections-register pattern works for any project that survives long enough to need correcting.

The thing nobody asked me to build

This was not a research project. Nobody is going to read the wiki except the table. I shipped it because it solved a small problem and because I wanted to see whether Cowork could do the durable parts of vibe-coding (the scaffolds, the contracts, the scheduled work, the parallel agents, the maintenance burden of a thing that compounds week over week) as well as it does the fast parts.

The answer turned out to be yes, with caveats, and the caveats are interesting. The caveats are most of the post. The caveats are why I think this kind of pet project is worth doing in public: it’s where the compression gap shows itself most clearly to the only person it really matters to, which is the person at the keyboard wondering whether they trust the thing they just shipped.

If you want to look at the wiki, it’s at nickipedia-wiki.web.app. Mike (the new caster, a Vedalken) joined on session 17; he is currently calling Hugo an abomination over a vial of luminous-blue blood. The wiki has 27 open mysteries, 16 entries on the active-quests page, and a calendar that goes back to 14 March 843 of the in-fiction year. The Egg is still missing.

The pipeline runs again next Friday.

Built across the weekend of 9-11 May 2026, with maintenance corrections through 17 May. The wiki itself is at nickipedia-wiki.web.app. The DM is Nick; the campaign is in his homebrew world of Eldurae. If you want to read more about the compression gap and the patterns this kind of work surfaces, the PhD-series posts are where I’m working through the broader argument.