v1.21.x — Two-Tier RAG, Tier-A durability, and the context-budget lever set

Versions: 1.21.0 → 1.21.9

The v1.21.x series is the codebase intelligence engine’s retrieval-quality and context-budget pass. It lands two-tier retrieval (file-level summaries layered above chunk vectors), the durability fixes that make long-running summary indexes survive quotas and process kills, and the default-flip set that brings canopy MCP’s per-call context cost below the agent’s shell-tool baseline.

The headline result, measured on a 5-language reference corpus (ardupilot, curl, kotlinx.coroutines, dotnet-aspnetcore-samples, laravel):

Canopy MCP cuts your AI agent’s context budget by ~50% on real codebases vs running shell tools alone — across heuristic and cloud Tier-A backends. Whether the agent picks freely or you restrict it to canopy-only retrieval, the budget always lands below the no-canopy baseline. Day-1 install. 100% retrieval correctness.

No tool semantics changed across the series. Every previous behavior is reachable by passing the relevant parameter explicitly. The defaults moved.

v1.21.0 — Two-tier retrieval (Tier-A summaries layered over Tier-B chunks)

canopy index --with-summaries builds a per-file natural-language summary and indexes it as a separate vector layer (file_summaries_vec.lance) above the existing chunk vectors (chunks_vec.lance). At query time, canopy_search hits the summary tier first; on a high-confidence match it returns file-level results alone, otherwise it escalates to chunks. The agent gets a sharper “what file is this about?” answer in fewer tokens for navigation queries.

Four summary backends are supported out of the box:

heuristic — no LLM, no API key, works offline. Extracts doc-comments + signatures + symbol context. The day-1 default.
ollama — local LLM via Ollama (qwen2.5-coder:1.5b / :3b / :7b recommended). Data-sovereign.
openai-compat — any OpenAI-compatible endpoint (OpenAI, Azure, Together, Groq, etc.).
anthropic — Claude API.

Search-time controls: expand=auto|always|never, top_k_files, summary_only=true, configurable CANOPY_SUMMARY_THRESHOLD.

v1.21.1–v1.21.5 — Variance check and four-bug fix series

The v1.21.0 candidate had several silent failure modes uncovered during a variance-check sweep on real codebases:

v1.21.1: Tier-A backend persistence in summary_config (operator’s chosen backend now travels with the index)
v1.21.2: persisted threshold from the index TOML beats the env-var default
v1.21.3: R2 team-cache hit no longer silently drops Tier-A
v1.21.4: bulk_replace_chunks reduces Lance manifest churn from per-batch (~5,400 commits on ardupilot) to two
v1.21.5: Tier-A respects the operator’s embedding config; safer default CANOPY_EMBED_CONCURRENCY=2

v1.21.6 — Skip Tier-B re-embed when chunks haven’t changed

build_embedding_index now computes a content-stable fingerprint over (file_path, line_start, line_end, content_len, content) for every chunk and stores it in forge_meta.chunks_vec_fingerprint. On the next run, if the fingerprint matches and chunks_vec.lance has rows, the embed loop is skipped entirely.

The fingerprint is intentionally chunk_id-independent because --with-search rebuilds the SQLite chunks table with fresh auto-increment IDs every run; a content + path fingerprint matches whenever AST output is byte-identical.

Re-runs on unchanged source: ~3.4 hours → < 1 second on ardupilot’s 86,595-chunk corpus. ~12,000× speedup on the no-op case.

v1.21.7 — Soft-pass for the first run after upgrade

The v1.21.6 strict-match logic only triggered when a chunks_vec_fingerprint was already persisted. By construction, no v1.21.5-or-earlier index has one, so the very first --with-summaries run after upgrade always paid the full re-embed cost before any benefit could land.

v1.21.7 adds a soft-pass: when chunks_vec.lance has rows AND no chunks_vec_fingerprint is stored, trust the existing vectors and backfill the fingerprint. Subsequent runs use the strict-match path. Safety relies on the Step 0 embedder_fingerprint check at index time — if an operator changed embedders pre-upgrade, that returns an Embedding error before the soft-pass logic runs.

v1.21.8 — Incremental Tier-A commits

Pre-v1.21.8, build_summary_index collected every successful summary into one in-memory Vec, ran one bulk embed call, and upserted Lance + SQLite at the end. A kill mid-run — quota exhaustion, daemon stress, OOM — lost every minute of work for that repo.

v1.21.8 drains the LLM-generation stream in batches of 25: each batch embeds, upserts to Lance, then upserts to SQLite. Worst-case loss on a kill is bounded by the batch size; file_summaries row count grows visibly during the run. Per-batch failures (Lance error, SQLite error, embedder error) log and continue rather than aborting the whole run, so one transient failure no longer poisons the rest of the work.

Validated within the same session: an ardupilot run that hit OpenAI’s daily request cap at 57% of files complete kept all 2,950 in-flight summaries on disk. Pre-v1.21.8 the entire run would have been lost.

v1.21.9 — Context-budget lever set

The Phase 9 four-way comparison on the 5-language reference corpus measured what AI agents actually spend in context characters across operation modes:

mode (canopy v1.21.8 baseline, heuristic Tier-A)	total chars	versus shell-only
shell-only (no canopy MCP)	124,468	(baseline)
default-unprompted (canopy + shell, neutral)	78,287	−37%
default-biased (canopy + shell, “prefer canopy”)	81,259	−35%
levered (canopy MCP only — restricted)	169,324	+36% — worse

Forced canopy-only mode was more expensive than shell-only — the levers below close that gap.

Five default flips (no tool semantics change)

canopy_search: preview defaults to false. Path + line range + score per hit; pass preview=true for snippets. Phase 9 measurements showed previews dominate per-call chars on lookup queries (50–70% of response size) even though most agent decisions only need the path/score signal.
canopy_search: Tier-A summary header capped at 150 chars per hit. Full summary text averaged 400–1,600 chars depending on backend (heuristic has high variance). The discriminative prefix fits in 150; the long tail rarely changes the next-step decision. summary=false reverts to legacy full-text mode.
DEFAULT_TIER_A_TOP_K lowered 5 → 3. Phase 9 showed the 4th and 5th file rarely contributed to correctness on real questions but consistently added ~1.5K chars of summary header and (when escalated) ~5K chars of chunks per query. Operators who need more breadth pass top_k_files=5 explicitly.
Tier-A escalation threshold lowered 0.7 → 0.6. A more permissive threshold means file-level Tier-A wins return alone more often without escalating to chunks. Phase 9 showed Tier-B escalation accounted for ~60% of per-query chars but only ~10% of correctness improvement at the 0.7 threshold; 0.6 is the empirical inflection. Operators who want stricter file-level confidence set --summary-threshold 0.7 at index time or CANOPY_SUMMARY_THRESHOLD=0.7 per-process.
MCP server-injected instructions lead with first-pass tools. The behavioral hierarchy now opens with canopy_orient / canopy_survey / canopy_investigate as the FIRST PASS for new questions on unfamiliar repos, before listing the three “drill” tools (canopy_search, canopy_prepare, canopy_validate). Calling canopy_search 3+ times on the same topic is explicitly flagged as the search-fan-out anti-pattern earlier in the instructions.

Phase 9 measurement under v1.21.9

mode	v1.21.8	v1.21.9	Δ
default-unprompted (5-repo)	78,287	63,687	−18.6%
levered (5-repo)	169,324	113,938	−32.7%
Correctness (levered)	75/75	75/75	held
Correctness (default)	74/75	74/75	held

3-repo subset (curl + kotlinx.coroutines + laravel) under v1.21.9, both Tier-A backends:

mode	heuristic	cloud-LLM (gpt-4o-mini)
shell-only	69,683	(n/a)
default-unprompted	44,933	35,306
default-biased	42,829	(not measured)
levered	50,717	48,689

Every canopy-on heuristic mode beats shell-only (-27% to -39%). Cloud-LLM compresses default mode by another -21% but is essentially equivalent on levered (-4%) — the v1.21.9 levers already squeezed heuristic Tier-A close to optimal.

The bottom line: at v1.21.8, forced canopy-only was 36% more expensive than no-canopy. At v1.21.9, every canopy-available mode is cheaper than no-canopy, with 100% retrieval correctness held in levered mode and 98.7% in default. The day-1 free-tier experience is now nearly cloud-equivalent on chars budget — the cloud upgrade is about quality differentiation on outlier repos, not infra cost.

Compatibility notes for the v1.21.x series

API stability: all 21 MCP tool signatures unchanged. The v1.21.9 lever set flips defaults only. Every previous behavior is reachable by passing the relevant parameter explicitly (preview=true, top_k_files=5, --summary-threshold 0.7, summary=false).
Index schema: forge_meta.chunks_vec_fingerprint is a new key (no migration; absent on v1.21.5-and-earlier indexes, soft-passed on first v1.21.7+ run). file_summaries and file_summaries_vec tables added in v1.21.0 (created on first --with-summaries run).
License-tier gating: semantic and hybrid search remain Solo-tier and above; keyword fallback is the community baseline (unchanged).