v1.21.x — Two-Tier RAG, Tier-A durability, and the context-budget lever set
Versions: 1.21.0 → 1.21.9
The v1.21.x series is the codebase intelligence engine’s retrieval-quality and context-budget pass. It lands two-tier retrieval (file-level summaries layered above chunk vectors), the durability fixes that make long-running summary indexes survive quotas and process kills, and the default-flip set that brings canopy MCP’s per-call context cost below the agent’s shell-tool baseline.
The headline result, measured on a 5-language reference corpus (ardupilot, curl, kotlinx.coroutines, dotnet-aspnetcore-samples, laravel):
Canopy MCP cuts your AI agent’s context budget by ~50% on real codebases vs running shell tools alone — across heuristic and cloud Tier-A backends. Whether the agent picks freely or you restrict it to canopy-only retrieval, the budget always lands below the no-canopy baseline. Day-1 install. 100% retrieval correctness.
No tool semantics changed across the series. Every previous behavior is reachable by passing the relevant parameter explicitly. The defaults moved.
v1.21.0 — Two-tier retrieval (Tier-A summaries layered over Tier-B chunks)
Section titled “v1.21.0 — Two-tier retrieval (Tier-A summaries layered over Tier-B chunks)”canopy index --with-summaries builds a per-file natural-language summary and indexes it as a separate vector layer (file_summaries_vec.lance) above the existing chunk vectors (chunks_vec.lance). At query time, canopy_search hits the summary tier first; on a high-confidence match it returns file-level results alone, otherwise it escalates to chunks. The agent gets a sharper “what file is this about?” answer in fewer tokens for navigation queries.
Four summary backends are supported out of the box:
heuristic— no LLM, no API key, works offline. Extracts doc-comments + signatures + symbol context. The day-1 default.ollama— local LLM via Ollama (qwen2.5-coder:1.5b/:3b/:7brecommended). Data-sovereign.openai-compat— any OpenAI-compatible endpoint (OpenAI, Azure, Together, Groq, etc.).anthropic— Claude API.
Search-time controls: expand=auto|always|never, top_k_files, summary_only=true, configurable CANOPY_SUMMARY_THRESHOLD.
v1.21.1–v1.21.5 — Variance check and four-bug fix series
Section titled “v1.21.1–v1.21.5 — Variance check and four-bug fix series”The v1.21.0 candidate had several silent failure modes uncovered during a variance-check sweep on real codebases:
- v1.21.1: Tier-A backend persistence in
summary_config(operator’s chosen backend now travels with the index) - v1.21.2: persisted
thresholdfrom the index TOML beats the env-var default - v1.21.3: R2 team-cache hit no longer silently drops Tier-A
- v1.21.4:
bulk_replace_chunksreduces Lance manifest churn from per-batch (~5,400 commits on ardupilot) to two - v1.21.5: Tier-A respects the operator’s embedding config; safer default
CANOPY_EMBED_CONCURRENCY=2
v1.21.6 — Skip Tier-B re-embed when chunks haven’t changed
Section titled “v1.21.6 — Skip Tier-B re-embed when chunks haven’t changed”build_embedding_index now computes a content-stable fingerprint over (file_path, line_start, line_end, content_len, content) for every chunk and stores it in forge_meta.chunks_vec_fingerprint. On the next run, if the fingerprint matches and chunks_vec.lance has rows, the embed loop is skipped entirely.
The fingerprint is intentionally chunk_id-independent because --with-search rebuilds the SQLite chunks table with fresh auto-increment IDs every run; a content + path fingerprint matches whenever AST output is byte-identical.
Re-runs on unchanged source: ~3.4 hours → < 1 second on ardupilot’s 86,595-chunk corpus. ~12,000× speedup on the no-op case.
v1.21.7 — Soft-pass for the first run after upgrade
Section titled “v1.21.7 — Soft-pass for the first run after upgrade”The v1.21.6 strict-match logic only triggered when a chunks_vec_fingerprint was already persisted. By construction, no v1.21.5-or-earlier index has one, so the very first --with-summaries run after upgrade always paid the full re-embed cost before any benefit could land.
v1.21.7 adds a soft-pass: when chunks_vec.lance has rows AND no chunks_vec_fingerprint is stored, trust the existing vectors and backfill the fingerprint. Subsequent runs use the strict-match path. Safety relies on the Step 0 embedder_fingerprint check at index time — if an operator changed embedders pre-upgrade, that returns an Embedding error before the soft-pass logic runs.
v1.21.8 — Incremental Tier-A commits
Section titled “v1.21.8 — Incremental Tier-A commits”Pre-v1.21.8, build_summary_index collected every successful summary into one in-memory Vec, ran one bulk embed call, and upserted Lance + SQLite at the end. A kill mid-run — quota exhaustion, daemon stress, OOM — lost every minute of work for that repo.
v1.21.8 drains the LLM-generation stream in batches of 25: each batch embeds, upserts to Lance, then upserts to SQLite. Worst-case loss on a kill is bounded by the batch size; file_summaries row count grows visibly during the run. Per-batch failures (Lance error, SQLite error, embedder error) log and continue rather than aborting the whole run, so one transient failure no longer poisons the rest of the work.
Validated within the same session: an ardupilot run that hit OpenAI’s daily request cap at 57% of files complete kept all 2,950 in-flight summaries on disk. Pre-v1.21.8 the entire run would have been lost.
v1.21.9 — Context-budget lever set
Section titled “v1.21.9 — Context-budget lever set”The Phase 9 four-way comparison on the 5-language reference corpus measured what AI agents actually spend in context characters across operation modes:
| mode (canopy v1.21.8 baseline, heuristic Tier-A) | total chars | versus shell-only |
|---|---|---|
| shell-only (no canopy MCP) | 124,468 | (baseline) |
| default-unprompted (canopy + shell, neutral) | 78,287 | −37% |
| default-biased (canopy + shell, “prefer canopy”) | 81,259 | −35% |
| levered (canopy MCP only — restricted) | 169,324 | +36% — worse |
Forced canopy-only mode was more expensive than shell-only — the levers below close that gap.
Five default flips (no tool semantics change)
Section titled “Five default flips (no tool semantics change)”canopy_search:previewdefaults tofalse. Path + line range + score per hit; passpreview=truefor snippets. Phase 9 measurements showed previews dominate per-call chars on lookup queries (50–70% of response size) even though most agent decisions only need the path/score signal.canopy_search: Tier-A summary header capped at 150 chars per hit. Full summary text averaged 400–1,600 chars depending on backend (heuristic has high variance). The discriminative prefix fits in 150; the long tail rarely changes the next-step decision.summary=falsereverts to legacy full-text mode.DEFAULT_TIER_A_TOP_Klowered 5 → 3. Phase 9 showed the 4th and 5th file rarely contributed to correctness on real questions but consistently added ~1.5K chars of summary header and (when escalated) ~5K chars of chunks per query. Operators who need more breadth passtop_k_files=5explicitly.- Tier-A escalation threshold lowered 0.7 → 0.6. A more permissive threshold means file-level Tier-A wins return alone more often without escalating to chunks. Phase 9 showed Tier-B escalation accounted for ~60% of per-query chars but only ~10% of correctness improvement at the 0.7 threshold; 0.6 is the empirical inflection. Operators who want stricter file-level confidence set
--summary-threshold 0.7at index time orCANOPY_SUMMARY_THRESHOLD=0.7per-process. - MCP server-injected instructions lead with first-pass tools. The behavioral hierarchy now opens with
canopy_orient/canopy_survey/canopy_investigateas the FIRST PASS for new questions on unfamiliar repos, before listing the three “drill” tools (canopy_search,canopy_prepare,canopy_validate). Callingcanopy_search3+ times on the same topic is explicitly flagged as the search-fan-out anti-pattern earlier in the instructions.
Phase 9 measurement under v1.21.9
Section titled “Phase 9 measurement under v1.21.9”| mode | v1.21.8 | v1.21.9 | Δ |
|---|---|---|---|
| default-unprompted (5-repo) | 78,287 | 63,687 | −18.6% |
| levered (5-repo) | 169,324 | 113,938 | −32.7% |
| Correctness (levered) | 75/75 | 75/75 | held |
| Correctness (default) | 74/75 | 74/75 | held |
3-repo subset (curl + kotlinx.coroutines + laravel) under v1.21.9, both Tier-A backends:
| mode | heuristic | cloud-LLM (gpt-4o-mini) |
|---|---|---|
| shell-only | 69,683 | (n/a) |
| default-unprompted | 44,933 | 35,306 |
| default-biased | 42,829 | (not measured) |
| levered | 50,717 | 48,689 |
Every canopy-on heuristic mode beats shell-only (-27% to -39%). Cloud-LLM compresses default mode by another -21% but is essentially equivalent on levered (-4%) — the v1.21.9 levers already squeezed heuristic Tier-A close to optimal.
The bottom line: at v1.21.8, forced canopy-only was 36% more expensive than no-canopy. At v1.21.9, every canopy-available mode is cheaper than no-canopy, with 100% retrieval correctness held in levered mode and 98.7% in default. The day-1 free-tier experience is now nearly cloud-equivalent on chars budget — the cloud upgrade is about quality differentiation on outlier repos, not infra cost.
Compatibility notes for the v1.21.x series
Section titled “Compatibility notes for the v1.21.x series”- API stability: all 21 MCP tool signatures unchanged. The v1.21.9 lever set flips defaults only. Every previous behavior is reachable by passing the relevant parameter explicitly (
preview=true,top_k_files=5,--summary-threshold 0.7,summary=false). - Index schema:
forge_meta.chunks_vec_fingerprintis a new key (no migration; absent on v1.21.5-and-earlier indexes, soft-passed on first v1.21.7+ run).file_summariesandfile_summaries_vectables added in v1.21.0 (created on first--with-summariesrun). - License-tier gating: semantic and hybrid search remain Solo-tier and above; keyword fallback is the community baseline (unchanged).