AI Workflows: Agentic CLI Development

Learn how to build agentic AI workflows with CLI tools using Claude Code, Ollama, and automation for efficient development

AI Workflows - Local Agentic Coding CLI with Open-Source Models

Introduction

In AI Agents for Software Development, I outlined high-level patterns for agentic coding with tools like Claude Code: TDD loops, specialized agents, multi-agent orchestration, and safety rails.

AI Workflows is the concrete implementation of those ideas on your own machine.

Instead of relying on closed, cloud-hosted models, AI Workflows gives you a CLI-first agentic coding environment powered by local Ollama models. It mimics Claude-style workflows:

Router agents that dispatch tasks to specialists
File/search/bash tools wired into the agent
TDD loops that run tests, analyze failures, and propose fixes
Multi-repo workflows and IC-specific review helpers

All of this runs locally with open-source models like llama3.2 and qwen2.5.

What Is AI Workflows?

At a high level:

A CLI coding tool that brings Claude-like agentic workflows to local, open-source models with intelligent routing, specialized agents, and TDD automation.

Core ideas:

Local-first: Use Ollama models on your own hardware
TDD by default: Tests drive the loop; agents respond to failures
Tools everywhere: Read/edit files, grep, glob, and run bash commands
Router + specialists: Tasks get sent to the right agent type
Multi-repo aware: Target any repository path, not just the current directory

This is the engine I use to experiment with overnight autonomous development across multiple projects.

Architecture Overview

The project is structured as a classic Python CLI + agents + tools stack:

		ai-workflows/
├── src/
│   ├── client/          # Ollama API client
│   ├── tools/           # File, bash, glob, grep tools
│   ├── agents/          # Router + specialist coding agents
│   ├── context/         # Context + summarization
│   └── cli/             # Command-line interface
├── tests/               # TDD-first test suite
├── IMPLEMENTATION_GUIDE.md
└── NEXT_STEPS.md

Key pieces:

OllamaClient
- Async client for talking to local Ollama models
- Handles streaming + token counting
Tools
- ReadTool – read files with line numbers/pagination
- WriteTool – create new files
- EditTool – replace text in existing files
- GlobTool – find files by pattern
- GrepTool – search file contents by regex
- BashTool – run shell commands safely
Agents
- RouterAgent – understands the task and routes it
- CoderAgent – writes/edits code, reasons about tests
- DebateOrchestrator – runs two models + judge for “debate mode”
Context Management
- Tracks conversation history
- Summarizes automatically as you approach token limits

ASCII Workflow Diagram

To visualize how everything connects, here’s the high-level workflow that AI Workflows automates:

		              ┌───────────────────────────────┐
              │       Developer Task         │
              │   "Implement / review X"     │
              └──────────────┬───────────────┘
                             │
                             ▼
                     ┌───────────────┐
                     │  RouterAgent  │
                     │ (model via    │
                     │  Ollama)      │
                     └──────┬────────┘
                            │ selects specialist
        ┌───────────────────┼──────────────────────┐
        ▼                   ▼                      ▼
 ┌──────────────┐   ┌──────────────┐       ┌─────────────────┐
 │  CoderAgent   │   │  Reviewer    │       │  Future Agents   │
 │ (code / TDD)  │   │ / Analyzer   │       │  (orchestrator, │
 └──────┬───────┘   └──────┬───────┘       │   others)        │
        │                  │               └─────────┬───────┘
        │                  │                         │
        ▼                  ▼                         ▼
 ┌──────────────┐   ┌──────────────┐       ┌──────────────────┐
 │   Tools       │   │   Tests      │       │   Git / Branches │
 │ Read / Write  │   │  pytest,     │       │  commits,        │
 │ Glob / Grep   │   │  vitest, etc.│       │  checkpoints     │
 │ Bash          │   └──────┬───────┘       └─────────┬────────┘
 └──────┬───────┘          │                          │
        │                  │                          │
        ▼                  ▼                          ▼
 ┌────────────────────────────────────────────────────────┐
 │                    TDD Loop (tdd_loop.py)               │
 │ - Run tests                                           │
 │ - Analyze failures with CoderAgent                    │
 │ - Suggest / apply fixes                               │
 │ - When green, pull next item from TODO.md (proactive) │
 └──────────────┬─────────────────────────────────────────┘
                 │
                 ▼
        ┌───────────────────────────────────────────┐
        │ Multi-Repo Mode (--repo-path, batch runs) │
        │ - Frontend / Backend / Contracts          │
        │ - Each repo gets its own TDD loop         │
        └──────────────┬────────────────────────────┘
                       │
                       ▼
              ┌──────────────────────────────┐
              │ Updated code + green tests   │
              │ ready for human review       │
              └──────────────────────────────┘

On top of this, the CLI layer exposes high-level commands for:

Free-form tasks (run)
Debate mode (debate)
IC project review (ic review)
Continuous TDD loops (tdd_loop.py)

Why Local Models + Ollama?

Cloud LLMs are great, but:

You depend on external APIs and pricing
You ship potentially sensitive code to third parties
You’re limited by rate limits and context policies you don’t control

AI Workflows takes the opposite approach:

Local models via Ollama: run llama3.2, qwen2.5, etc. on your own GPU/CPU
Open-source first: tune the prompt, architecture, and tools freely
Offline-friendly: your agentic workflows don’t die when the wifi does

You still get the agentic ergonomics (router agent, tools, context) while owning the entire stack.

Getting Started

Prerequisites

Python 3.10+
uv
Ollama installed and running
Git

Install

From the project root:

		git clone <repository-url>
cd ai-workflows
 
# Create venv and install deps via uv
uv venv
uv pip install -e .
 
# Pull models
ollama pull llama3.2:1b
ollama pull qwen2.5:7b

Quick CLI Usage

		# See all commands
uv run ai-workflow --help
 
# Run an IC-focused review on a project
uv run ai-workflow ic review --target /path/to/project
 
# General free-form task
uv run ai-workflow run "Refactor the auth module" --model qwen2.5:7b

Feature Highlight: IC Difference Reviewer

One of the first opinionated workflows is an IC dapp reviewer that compares a project against a “seachan gold standard”:

		# Non-interactive review
uv run ai-workflow ic review --target /home/archie/repos/pilcrow
 
# Interactive: approve/reject suggestions one by one
uv run ai-workflow ic review --target /home/archie/repos/pilcrow --interactive
 
# Auto-apply only high-confidence suggestions
uv run ai-workflow ic review --target /home/archie/repos/pilcrow --apply-suggestions

What it does:

Scans the target repo
Compares key files vs. a reference seachan implementation
Uses AI to:
- Explain differences
- Suggest improvements
- Avoid changing dapp-specific logic, branding, or CanDB architecture
Surfaces confidence scores before anything is applied

This turns a “gold standard” dapp into a living template you can review other IC projects against.

Feature Highlight: TDD Loop Automation

The heart of AI Workflows is tdd_loop.py: an AI-augmented test-driven development loop.

Basic Usage

		python /home/archie/repos/ai-workflows/tdd_loop.py
 
# Watch mode (re-run on file changes)
python /home/archie/repos/ai-workflows/tdd_loop.py --watch
 
# Auto-fix mode (AI suggests patches)
python /home/archie/repos/ai-workflows/tdd_loop.py --auto-fix

Proactive Mode

		# Proactive TDD loop
python /home/archie/repos/ai-workflows/tdd_loop.py --proactive --watch
 
# Full auto: watch + auto-fix + proactive
python /home/archie/repos/ai-workflows/tdd_loop.py --watch --auto-fix --proactive

How the loop works:

Run pytest and capture results
Use CoderAgent to analyze failing tests
Propose precise code changes (not just “try X”)
Optionally apply changes and rerun tests
Repeat until green or until you stop it

In proactive mode, when all tests are green it will:

Read your TODO.md
Choose the next high-priority task
Generate tests first (Red)
Help you implement the feature (Green)
Move on to the next TODO

This encodes a full Red → Green → Refactor → Next TODO loop directly into the tool.

Multi-Repository TDD Workflows

A critical design goal is running TDD loops on any repo, not just the AI Workflows project itself.

Targeting External Repos

		# Basic: point at any repo
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/my-dapp
 
# With proactive mode
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/frontend --proactive --watch
 
# Use a different model
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/backend --model llama3.2:1b --auto-fix

Under the hood, the loop:

Switches into the target repo directory
Discovers pytest-style tests (tests/ or test_*.py)
Runs tests in that environment
Looks for TODO.md for proactive work
Writes code changes inside the target repo
Restores the original directory when done

Multi-Repo Batch Mode

You can run multiple TDD loops in parallel (e.g., using tmux):

		# Frontend
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/frontend 
  --proactive --watch 
  --model qwen2.5:7b &
 
# Backend
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/backend 
  --proactive --watch 
  --model qwen2.5:7b &
 
# Contracts
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/contracts 
  --proactive --watch 
  --model llama3.2:1b &

The idea: go to sleep, wake up to multiple repos with more tests passing and TODOs implemented, ready for human review.

Debate Mode: Two Models + Judge

For tricky tasks, AI Workflows supports a small-scale model debate:

		uv run ai-workflow debate "Implement feature X" 
  --model-a llama3.2:1b 
  --model-b qwen2.5:7b 
  --judge-model qwen2.5:7b

Workflow:

Two proposal agents (A and B) generate competing implementations
A judge model:
- Compares both
- Explains tradeoffs
- Picks a winner or synthesizes a hybrid

This is lightweight but surprisingly effective for:

API design choices
Refactoring strategies
Non-obvious algorithmic decisions

Testing and TDD Culture

The project itself is built TDD-first:

Tests for everything
- test_client.py – Ollama client behavior
- test_tools.py – all tools
- test_context.py – context + summarization
- test_agents.py – router + coder agents
Implementation discipline
1. Write failing tests
2. Implement minimal code to go green
3. Refactor with tests intact
Docs for contributors
- IMPLEMENTATION_GUIDE.md – how to extend tools/agents
- NEXT_STEPS.md – roadmap and feature ideas
- TODO.md – task list tuned for cheaper local models

AI Workflows isn’t just a TDD helper; it’s a TDD-native codebase.

Vision: Overnight Multi-Agent Development

The long-term goal is to support multi-agent overnight workflows across multiple dapps and repos.

Planned capabilities include:

Task queue + agent pool
- Priority queue with dependencies
- Multiple agents working in parallel
- Resource-aware scheduling
Git workflow automation
- Per-task feature branches
- Automatic commits + PR creation
- Rollbacks when tests regress
Cross-project orchestration
- Shared context between related repos
- Coordinated features across frontend, backend, and contracts
- Unified reporting ("what changed overnight?")

From a dev’s perspective, the dream is:

Describe the next week of work as tasks + tests, start the agents, wake up to green tests and ready-to-review PRs.

How This Relates to AI Agents for Development

If AI Agents for Software Development is the theory of agentic coding workflows, AI Workflows is the laboratory where those ideas are implemented and stress-tested with local models.

Specialized agents → RouterAgent, CoderAgent, debate orchestrator
TDD loops → tdd_loop.py + pytest integration
Multi-repo orchestration → --repo-path + batch workflows
Safety + observability → test-first, checkpoints, and local-only execution

The two posts are meant to be read together:

Concepts – patterns, roles, and safety practices
Implementation – a real CLI that embodies those patterns with open-source models

Conclusion

AI Workflows is my attempt to bring Claude-style agentic coding to a local, open-source, TDD-first environment.

If you want to:

Experiment with agentic development without sending code to a cloud LLM
Run TDD loops that actually drive feature work
Orchestrate multiple repos overnight on your own hardware

…AI Workflows gives you a solid starting point.

Clone it, point it at a project you care about, and let local agents start doing real work.