Active Work

Claude Code Agentic Workflows

Claude Sonnet / Opus (Anthropic API)Claude Code CLIPythonBashYAMLGit worktrees

About this project

Autonomous multi-agent software engineering system using Claude's tool-use API with parallel sub-agents, hook-driven automation, and persistent memory. Enables autonomous, multi-step codebase modification with full rollback safety.

Background

The question I was trying to answer: how far can you push an AI coding agent before you need to hand control back to a human? The practical limit isn't capability — modern models can read, reason about, and modify code at a level that makes most routine tasks tractable. The limit is safety and coherence: can the agent maintain a consistent mental model of the codebase across many steps, and can you trust that it won't take a destructive action you didn't intend?

The architecture I built uses specialised sub-agents for different phases — Explore agents map the codebase structure, Plan agents reason about the implementation approach, Review agents verify the output against the original intent. These run in parallel where the tasks are independent, coordinated via message passing. The Claude tool-use API (Read, Edit, Bash, Write) with permission gating means every file modification or shell command can be audited before execution.

The persistent memory system was necessary because a single context window isn't enough for a long-running project. I implemented file-backed memory across four types: user (preferences and working style), feedback (corrections the agent learns from), project (current state and decisions), and reference (where to find things in external systems). Each session starts by loading the relevant memory files, which gives continuity without requiring the full conversation history.

Worktree isolation is the safety mechanism: every significant change happens in a separate git worktree, which means the main branch is always clean and any failed or unwanted change can be discarded without leaving the codebase in a broken state. The workflow runs autonomously between explicit human review points — those gates are the answer to the safety question.

Highlights

Parallel sub-agents (Explore, Plan, Review) coordinated via message passing
Structured JSON tool calls (Read, Edit, Bash, Write) with permission gating and sandboxed execution
File-backed persistent memory system across sessions (user, feedback, project, reference types)
Hook-driven automation wired to tool lifecycle events
Worktree isolation for safe, reversible parallel code changes

← All projects GitHub ↗

← LLM Eval Harness Agent Harness — ECC →