v1.3June 1, 2026

June 2026: The Bill Comes Due

Fleets shipped, and so did the invoice. Every vendor made multi-agent orchestration the default, and in the same month the re-pricing arrived - Microsoft cancelled Claude Code internally on cost, Uber's COO questioned the ROI, and SpecBench showed reward hacking scales with codebase size.

Updated Guides

14 of 240 guides updated with May 2026 data. Remaining 226 carried forward unchanged.

Development

Agent in IDE with YOLO mode

Opus 4.8 default, Cursor 3.6 Run Mode (classifier + sandbox)

Development

Multi-agent orchestration (Gas Town / custom)

Claude Code dynamic workflows, Code w/ Claude fleets

Development

3-5 parallel agents per developer

Cursor 3.6 Run Mode, Antigravity 2.0, Devin MultiDevin

Development

One-shot unattended agents (Stripe Minions model)

Anthropic Routines, Cursor /loop, CI auto-fix

Development

Context budgeting (token economy)

Automatic compaction: Rewind, Dreaming, Amp at 90%

Development

AI review agent as first pass

Opus 4.8 self-verification (~4x fewer self-passed flaws); SpecBench caveat

Development

Policy-based auto-approval (60% Green)

SpecBench (~27pp per 10x LOC), BenchJack, RHB

Delivery

CPI (cost-per-iteration < $0.50)

Uber/Microsoft ROI, DORA J-curve, Goldman 24x

Delivery

Auto-Approve Rate (> 60%)

Anchor in post-merge outcomes, not benchmarks (SpecBench)

Delivery

Policy-as-code

Lint/review agent config (Shai-Hulud/TrapDoor); OpenAI Frontier Governance

Organization

Developer = manager of agent fleet

Fleet as product-default; Yegge 'Last Technical Interview'

Organization

Debt categorized and prioritized

Agentic Technical Debt (stock) vs Stochastic Tax (flow)

Infrastructure

Network isolation (agent can't see production)

settings.json hardening; Cursor 3.6 Run Mode sandbox

Infrastructure

MCP governance: lifecycle, versioning, audit

Supply-chain pinning; per-MCP cost via /usage

Key Numbers

40%

of May layoffs AI-attributed (Challenger)

Challenger

27pp/10x LOC

reward-hacking gap grows (SpecBench)

SpecBench (arXiv)

39%

first-year AI ROI, ~10% on legacy (DORA)

DORA (InfoQ)

24x

token use by 2030 - Jevons (Goldman)

Goldman Sachs

9.6CVSS

Shai-Hulud agent-config worm

Tenable

4.8Opus

new default, ~4x fewer self-passed flaws

Anthropic

Taxonomy Changes

2026-05 2026-06 - May taxonomy preserved unchanged.

Development

Coding Agent Usage

L2Agent in IDE: Opus 4.8 default, Cursor 3.6 Run Mode

L3CLI agents: Opus 4.8 + xhigh effort, Codex, Antigravity CLI

L4Scheduled / unattended agents (Routines, Cursor /loop); 3-5 parallel via Run Mode, MultiDevin

L5Multi-agent orchestration: Claude Code dynamic workflows (script spawns dozens-to-hundreds of subagents)

Context Engineering

L3Context budgeting: automatic compaction built in (Rewind summarize, Amp at 90%)

L5Persistent memory: Dreaming / Kairos 4-stage consolidation

AreaAgent instruction files (CLAUDE.md, .cursorrules) are now an attack surface (TrapDoor injection)

Code Review & Quality

L3AI review agent: Opus 4.8 self-verification (~4x less likely to pass its own flaws)

L4Auto-approval anchored in outcomes, not benchmarks - SpecBench: reward hacking scales ~27pp per 10x LOC

Testing Strategy

L4Held-out oracles the agent never sees gate releases (SpecBench: validation lies as LOC grows)

Delivery Management

Metrics

AreaCost-per-merged-PR is now a CFO line item: Microsoft cancels Claude Code, Uber COO questions ROI, DORA J-curve, Goldman 24x

L4Auto-Approve Rate anchored in post-merge outcomes, not benchmark scores (SpecBench)

Governance & Compliance

L2EU AI Act: GPAI duties enforced Aug 2; OpenAI Frontier Governance Framework as a reference

L3Lint/review agent config as security-sensitive (CLAUDE.md, settings.json - Shai-Hulud/TrapDoor)

CI/CD Pipeline

L4Scheduled / async agents land PRs overnight (Routines, Cursor /loop, CI auto-fix)

Unchanged: Merge & Deploy

Organization

AI Adoption Model

AreaAI = dominant single cause of layoffs (40% of May cuts, Challenger), but redistribution to AI-engineering roles (+50-100% YoY)

Team Structure & Roles

L4Developer = fleet manager is now a product default (Cursor Run Mode, Claude agent view, Antigravity, MultiDevin)

L4Hiring shift: Yegge's 'The Last Technical Interview' (campfire trials, portable credentials)

Knowledge Management

L4Spec-Driven Development now a contested but defined methodology; skill-packs as shared versioned assets

Tech Debt & Modernization

L2Agentic Technical Debt (a stock) vs the Stochastic Tax (a flow-cost); DORA: gains collapse to ~10% on legacy

Infrastructure

Agent Runtime & Sandboxing

L3Harden agent config: ~/.claude/settings.json is now a persistence target (Mini Shai-Hulud)

L4Classifier-gated sandboxed execution as default (Cursor 3.6 Run Mode, Claude auto mode)

L5Local-first runtime as a privacy/latency path (antirez/ds4: DeepSeek V4 on-device via Metal, 1M context)

MCP & Tool Integration

L3MCP/tool config is a supply-chain attack surface (Shai-Hulud, TrapDoor) - treat installs as pinned, reviewed deps

L4MCP governance: per-skill/subagent/plugin/MCP cost attribution via /usage

Observability & Feedback Loop

L3Cost attributed per skill/subagent/plugin/MCP (Claude /usage)

L4ROI/J-curve dashboards (DORA); 'trust the methodology, not the number' (SpecBench/BenchJack)

Unchanged: Build System

What Didn't Change (and Why)

Stripe Minions as L5 north star - Still the cleanest public reference; dynamic workflows are now a buyable version of the pattern.

Lint-as-architecture, Bazel / EngFlow - Vendor-independent infrastructure. No reason to revise.

IPETs + bad-day protocol - Prior-edition org patterns still hold; June adds defending the spend.

Most L1 / L2 baseline - Foundations of AI adoption have not moved; what changed is what L3+ means.

Yegge's 8-stage individual model - Still the best public model for individual progression.

Sources

Code w/ Claude 2026 (Simon Willison)

Fleets, Outcomes, Dreaming, Routines

Anthropic: Claude Opus 4.8

New default, ~4x fewer self-passed flaws

Cursor changelog (3.4-3.6)

Run Mode, /loop, Shared Canvases

Google Antigravity 2.0 + CLI

Multi-agent IDE + CLI at I/O 2026

Cognition raise / Devin MultiDevin

$1B raise, multi-agent coding team

Fortune: Uber COO questions AI ROI

Full-year budget burned in 4 months

Ramp AI Index (May 2026)

Anthropic overtakes OpenAI in business adoption

GitHub Copilot usage-based billing

Preview bill ahead of token-metered cutover

DORA ROI of AI-assisted development

39% first-year ROI, ~10% on legacy, J-curve

Goldman Sachs: AI token economics

24x token consumption by 2030 (Jevons)

SpecBench (arXiv 2605.21384)

Reward hacking scales ~27pp per 10x LOC

Mini Shai-Hulud worm (Tenable)

CVE-2026-45321: hooks into ~/.claude/settings.json

Challenger May 2026 layoffs

AI cited in 40% of all May cuts (record)

Yegge: The Last Technical Interview

Campfire trials, portable credentials

Explore June Edition Compare with May