Building Bugle: a hunting-law agent and a Monte Carlo draw engine
How I built a cited, multi-state regulations assistant — plus a draw-odds simulator — in about 25 hours, and the engineering decisions that made it hold together.
ProjectBugleBugle started as a simple question: why is it so hard to get a straight answer about hunting regulations? The rules differ by state, species, weapon, and season. They live in statutes, administrative code, and annual proclamations — different formats, different agencies. And the draw systems that hand out limited tags are opaque enough that companies charge money for a human to explain them.
I wanted an agent that could read the actual law, cite it, and — eventually — simulate your odds of drawing a tag. I built the first version in about 25 hours of focused work. Here's what made it hold together.
Treat the knowledge base as a versioned artifact
The first decision was the most important one: the corpus is immutable and versioned. Most AI products mutate their knowledge base in place — update the PDFs, re-ingest, hope nothing broke. That's how you serve a stale or half-ingested answer to a user without knowing.
Instead, every ingest targets a candidate version. Production traffic only ever hits the one active version. Promotion is a single atomic flag flip, rollback is trivial, and superseded regulations don't vanish — they're marked with a supersession date so the audit trail survives.
# Build a candidate, validate it, then promote — never touch production mid-ingest
pnpm ingest run utah-r657-5 --version 2026-candidate
pnpm ingest validate --version 2026-candidate
pnpm eval --corpus 2026-candidate # must clear the gates
pnpm ingest promote --version 2026-candidateThe whole pipeline is a CLI: fetch → parse → materialize → embed → validate, each stage idempotent. That CLI became the most valuable thing in the project. It let me bring on a new state in an afternoon instead of a week.
Retrieval that's allowed to say "no"
A wrong answer about the law isn't a typo — it's a citation or a fine. So the retrieval pipeline is built to refuse rather than guess. A query gets decomposed, run through hybrid search (vector + keyword + metadata), reranked, answered by Claude with streamed citations, and then verified for faithfulness — every step traced.
Two guardrails matter most: conversations are bound to a single state so retrieval can't cross jurisdictions, and the decomposition step explicitly refuses legal-advice and out-of-scope questions before any retrieval happens.
The eval harness runs the real pipeline
This is the part I'm most proud of. The eval suite doesn't reimplement retrieval — it imports the exact production code. So a model bump, a prompt tweak, or a chunking change is caught by the same 348 expert-validated cases that gate every release.
The golden set includes hallucination traps (must_not_claim) and must-refuse questions. Faithfulness and citation accuracy are hard gates — a corpus version doesn't ship unless it clears them.
The marquee feature: simulating the draw
The draw engine answers the question hunters actually care about: how many years until I'm likely to draw? It simulates the draw forward year by year — accounting for point creep, pool churn, resident vs. non-resident quotas, and rule changes (Colorado flips from a preference system to a hybrid split in 2028).
The key design move: draw mechanics are validated config data, not engine code.
// A rule change is a row, not a deploy.
{
state: "CO",
species: "elk",
mechanic: "hybrid_split", // 50% preference / 50% random
season_year_from: 2028,
config: { splitRatio: 0.5 },
}Seven mechanic types are implemented as pure functions. Preference draws are solved analytically (no sampling needed); the weighted and hybrid mechanics run a Monte Carlo simulation over the applicant pool. Everything is framed as a research projection — never a guarantee. That framing is non-negotiable, because the honest answer is "here's what the history implies," not "you will draw."
What I took away
You can move fast and be disciplined — they're not opposites. Corpus versioning, an eval harness over real code, and config-as-data were what let one person build a knowledge-heavy AI product in 25 hours without lying to users about correctness. The constraints were the accelerant.