Skip to main content
TokoTubeIC OS

AI Workflows: Agentic CLI Development

Learn how to build agentic AI workflows with CLI tools using Claude Code, Ollama, and automation for efficient development

AI Workflows - Local Agentic Coding CLI with Open-Source Models

Introduction

In AI Agents for Software Development, I outlined high-level patterns for agentic coding with tools like Claude Code: TDD loops, specialized agents, multi-agent orchestration, and safety rails.

AI Workflows is the concrete implementation of those ideas on your own machine.

Instead of relying on closed, cloud-hosted models, AI Workflows gives you a CLI-first agentic coding environment powered by local Ollama models. It mimics Claude-style workflows:

  • Router agents that dispatch tasks to specialists
  • File/search/bash tools wired into the agent
  • TDD loops that run tests, analyze failures, and propose fixes
  • Multi-repo workflows and IC-specific review helpers

All of this runs locally with open-source models like llama3.2 and qwen2.5.

What Is AI Workflows?

At a high level:

A CLI coding tool that brings Claude-like agentic workflows to local, open-source models with intelligent routing, specialized agents, and TDD automation.

Core ideas:

  • Local-first: Use Ollama models on your own hardware
  • TDD by default: Tests drive the loop; agents respond to failures
  • Tools everywhere: Read/edit files, grep, glob, and run bash commands
  • Router + specialists: Tasks get sent to the right agent type
  • Multi-repo aware: Target any repository path, not just the current directory

This is the engine I use to experiment with overnight autonomous development across multiple projects.

Architecture Overview

The project is structured as a classic Python CLI + agents + tools stack:

		ai-workflows/
├── src/
│   ├── client/          # Ollama API client
│   ├── tools/           # File, bash, glob, grep tools
│   ├── agents/          # Router + specialist coding agents
│   ├── context/         # Context + summarization
│   └── cli/             # Command-line interface
├── tests/               # TDD-first test suite
├── IMPLEMENTATION_GUIDE.md
└── NEXT_STEPS.md
	

Key pieces:

  • OllamaClient

    • Async client for talking to local Ollama models
    • Handles streaming + token counting
  • Tools

    • ReadTool – read files with line numbers/pagination
    • WriteTool – create new files
    • EditTool – replace text in existing files
    • GlobTool – find files by pattern
    • GrepTool – search file contents by regex
    • BashTool – run shell commands safely
  • Agents

    • RouterAgent – understands the task and routes it
    • CoderAgent – writes/edits code, reasons about tests
    • DebateOrchestrator – runs two models + judge for “debate mode”
  • Context Management

    • Tracks conversation history
    • Summarizes automatically as you approach token limits

ASCII Workflow Diagram

To visualize how everything connects, here’s the high-level workflow that AI Workflows automates:

		              ┌───────────────────────────────┐
              │       Developer Task         │
              │   "Implement / review X"     │
              └──────────────┬───────────────┘


                     ┌───────────────┐
                     │  RouterAgent  │
                     │ (model via    │
                     │  Ollama)      │
                     └──────┬────────┘
                            │ selects specialist
        ┌───────────────────┼──────────────────────┐
        ▼                   ▼                      ▼
 ┌──────────────┐   ┌──────────────┐       ┌─────────────────┐
 │  CoderAgent   │   │  Reviewer    │       │  Future Agents   │
 │ (code / TDD)  │   │ / Analyzer   │       │  (orchestrator, │
 └──────┬───────┘   └──────┬───────┘       │   others)        │
        │                  │               └─────────┬───────┘
        │                  │                         │
        ▼                  ▼                         ▼
 ┌──────────────┐   ┌──────────────┐       ┌──────────────────┐
 │   Tools       │   │   Tests      │       │   Git / Branches │
 │ Read / Write  │   │  pytest,     │       │  commits,        │
 │ Glob / Grep   │   │  vitest, etc.│       │  checkpoints     │
 │ Bash          │   └──────┬───────┘       └─────────┬────────┘
 └──────┬───────┘          │                          │
        │                  │                          │
        ▼                  ▼                          ▼
 ┌────────────────────────────────────────────────────────┐
 │                    TDD Loop (tdd_loop.py)               │
 │ - Run tests                                           │
 │ - Analyze failures with CoderAgent                    │
 │ - Suggest / apply fixes                               │
 │ - When green, pull next item from TODO.md (proactive) │
 └──────────────┬─────────────────────────────────────────┘


        ┌───────────────────────────────────────────┐
        │ Multi-Repo Mode (--repo-path, batch runs) │
        │ - Frontend / Backend / Contracts          │
        │ - Each repo gets its own TDD loop         │
        └──────────────┬────────────────────────────┘


              ┌──────────────────────────────┐
              │ Updated code + green tests   │
              │ ready for human review       │
              └──────────────────────────────┘
	

On top of this, the CLI layer exposes high-level commands for:

  • Free-form tasks (run)
  • Debate mode (debate)
  • IC project review (ic review)
  • Continuous TDD loops (tdd_loop.py)

Why Local Models + Ollama?

Cloud LLMs are great, but:

  • You depend on external APIs and pricing
  • You ship potentially sensitive code to third parties
  • You’re limited by rate limits and context policies you don’t control

AI Workflows takes the opposite approach:

  • Local models via Ollama: run llama3.2, qwen2.5, etc. on your own GPU/CPU
  • Open-source first: tune the prompt, architecture, and tools freely
  • Offline-friendly: your agentic workflows don’t die when the wifi does

You still get the agentic ergonomics (router agent, tools, context) while owning the entire stack.

Getting Started

Prerequisites

  • Python 3.10+
  • uv
  • Ollama installed and running
  • Git

Install

From the project root:

		git clone <repository-url>
cd ai-workflows
 
# Create venv and install deps via uv
uv venv
uv pip install -e .
 
# Pull models
ollama pull llama3.2:1b
ollama pull qwen2.5:7b
	

Quick CLI Usage

		# See all commands
uv run ai-workflow --help
 
# Run an IC-focused review on a project
uv run ai-workflow ic review --target /path/to/project
 
# General free-form task
uv run ai-workflow run "Refactor the auth module" --model qwen2.5:7b
	

Feature Highlight: IC Difference Reviewer

One of the first opinionated workflows is an IC dapp reviewer that compares a project against a “seachan gold standard”:

		# Non-interactive review
uv run ai-workflow ic review --target /home/archie/repos/pilcrow
 
# Interactive: approve/reject suggestions one by one
uv run ai-workflow ic review --target /home/archie/repos/pilcrow --interactive
 
# Auto-apply only high-confidence suggestions
uv run ai-workflow ic review --target /home/archie/repos/pilcrow --apply-suggestions
	

What it does:

  • Scans the target repo
  • Compares key files vs. a reference seachan implementation
  • Uses AI to:
    • Explain differences
    • Suggest improvements
    • Avoid changing dapp-specific logic, branding, or CanDB architecture
  • Surfaces confidence scores before anything is applied

This turns a “gold standard” dapp into a living template you can review other IC projects against.

Feature Highlight: TDD Loop Automation

The heart of AI Workflows is tdd_loop.py: an AI-augmented test-driven development loop.

Basic Usage

		python /home/archie/repos/ai-workflows/tdd_loop.py
 
# Watch mode (re-run on file changes)
python /home/archie/repos/ai-workflows/tdd_loop.py --watch
 
# Auto-fix mode (AI suggests patches)
python /home/archie/repos/ai-workflows/tdd_loop.py --auto-fix
	

Proactive Mode

		# Proactive TDD loop
python /home/archie/repos/ai-workflows/tdd_loop.py --proactive --watch
 
# Full auto: watch + auto-fix + proactive
python /home/archie/repos/ai-workflows/tdd_loop.py --watch --auto-fix --proactive
	

How the loop works:

  1. Run pytest and capture results
  2. Use CoderAgent to analyze failing tests
  3. Propose precise code changes (not just “try X”)
  4. Optionally apply changes and rerun tests
  5. Repeat until green or until you stop it

In proactive mode, when all tests are green it will:

  1. Read your TODO.md
  2. Choose the next high-priority task
  3. Generate tests first (Red)
  4. Help you implement the feature (Green)
  5. Move on to the next TODO

This encodes a full Red → Green → Refactor → Next TODO loop directly into the tool.

Multi-Repository TDD Workflows

A critical design goal is running TDD loops on any repo, not just the AI Workflows project itself.

Targeting External Repos

		# Basic: point at any repo
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/my-dapp
 
# With proactive mode
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/frontend --proactive --watch
 
# Use a different model
python /home/archie/repos/ai-workflows/tdd_loop.py --repo-path ~/projects/backend --model llama3.2:1b --auto-fix
	

Under the hood, the loop:

  • Switches into the target repo directory
  • Discovers pytest-style tests (tests/ or test_*.py)
  • Runs tests in that environment
  • Looks for TODO.md for proactive work
  • Writes code changes inside the target repo
  • Restores the original directory when done

Multi-Repo Batch Mode

You can run multiple TDD loops in parallel (e.g., using tmux):

		# Frontend
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/frontend 
  --proactive --watch 
  --model qwen2.5:7b &
 
# Backend
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/backend 
  --proactive --watch 
  --model qwen2.5:7b &
 
# Contracts
python /home/archie/repos/ai-workflows/tdd_loop.py 
  --repo-path ~/dapps/contracts 
  --proactive --watch 
  --model llama3.2:1b &
	

The idea: go to sleep, wake up to multiple repos with more tests passing and TODOs implemented, ready for human review.

Debate Mode: Two Models + Judge

For tricky tasks, AI Workflows supports a small-scale model debate:

		uv run ai-workflow debate "Implement feature X" 
  --model-a llama3.2:1b 
  --model-b qwen2.5:7b 
  --judge-model qwen2.5:7b
	

Workflow:

  1. Two proposal agents (A and B) generate competing implementations
  2. A judge model:
    • Compares both
    • Explains tradeoffs
    • Picks a winner or synthesizes a hybrid

This is lightweight but surprisingly effective for:

  • API design choices
  • Refactoring strategies
  • Non-obvious algorithmic decisions

Testing and TDD Culture

The project itself is built TDD-first:

  • Tests for everything

    • test_client.py – Ollama client behavior
    • test_tools.py – all tools
    • test_context.py – context + summarization
    • test_agents.py – router + coder agents
  • Implementation discipline

    1. Write failing tests
    2. Implement minimal code to go green
    3. Refactor with tests intact
  • Docs for contributors

    • IMPLEMENTATION_GUIDE.md – how to extend tools/agents
    • NEXT_STEPS.md – roadmap and feature ideas
    • TODO.md – task list tuned for cheaper local models

AI Workflows isn’t just a TDD helper; it’s a TDD-native codebase.

Vision: Overnight Multi-Agent Development

The long-term goal is to support multi-agent overnight workflows across multiple dapps and repos.

Planned capabilities include:

  • Task queue + agent pool

    • Priority queue with dependencies
    • Multiple agents working in parallel
    • Resource-aware scheduling
  • Git workflow automation

    • Per-task feature branches
    • Automatic commits + PR creation
    • Rollbacks when tests regress
  • Cross-project orchestration

    • Shared context between related repos
    • Coordinated features across frontend, backend, and contracts
    • Unified reporting ("what changed overnight?")

From a dev’s perspective, the dream is:

Describe the next week of work as tasks + tests, start the agents, wake up to green tests and ready-to-review PRs.

How This Relates to AI Agents for Development

If AI Agents for Software Development is the theory of agentic coding workflows, AI Workflows is the laboratory where those ideas are implemented and stress-tested with local models.

  • Specialized agents → RouterAgent, CoderAgent, debate orchestrator
  • TDD loops → tdd_loop.py + pytest integration
  • Multi-repo orchestration → --repo-path + batch workflows
  • Safety + observability → test-first, checkpoints, and local-only execution

The two posts are meant to be read together:

  1. Concepts – patterns, roles, and safety practices
  2. Implementation – a real CLI that embodies those patterns with open-source models

Conclusion

AI Workflows is my attempt to bring Claude-style agentic coding to a local, open-source, TDD-first environment.

If you want to:

  • Experiment with agentic development without sending code to a cloud LLM
  • Run TDD loops that actually drive feature work
  • Orchestrate multiple repos overnight on your own hardware

…AI Workflows gives you a solid starting point.

Clone it, point it at a project you care about, and let local agents start doing real work.