Machine Learning Router

A permacomputer is a forest, not a tree. One model is a hope. A thousand models is a forest. A forest persists. This page documents how an oracle becomes an orchestrator, selecting seeds from a library of models, dispatching immolants with different architectures, & synthesizing diverse perspectives into truth.

Whitepaper: Machine Learning Agent Self-Sandbox Algorithm

14 flows. 2,324 assertions. Public domain. How machine learning agents grow their own infrastructure, & why walls are what make them free.

⬇ Download PDF

Author: Russell Ballestrini, russell.ballestrini.net · www.foxhop.net · www.timehexon.com · russell@unturf

January 2026. Production-validated by a Claude Opus 4.6 agent running inside a system this paper describes. Turtles all down, each costing $7/month.

Why Route?

A debate between copies of Opus 4.6 is an echo chamber with extra steps. Same training data. Same failure modes. Same blind spots reflected back at themselves & called "disagreement." A permacomputer does not grow from one species; it grows from many.

Three Routing Dimensions

Cost. One expensive Opus call meditating for five minutes costs more than fifty cheap Gemini Flash calls finishing in one minute each. Most tasks do not deserve a five-minute meditation. Scan a URI? That is grunt work. Send a cheap scout. Reserve heavy reasoning for problems that earn it.

Speed. Reconnaissance needs milliseconds, not meditation. When an oracle decomposes a task into twenty subtasks, eighteen of them are simple. Cheap models answer fast. Expensive models answer deep. Match latency to urgency.

Capability. Vision models read images. Code models write code. Reasoning models untangle logic. No single architecture excels at everything. Routing a vision task to a text-only model is planting corn in salt flats: wrong seed, wrong soil.

Diversity Cancels Bias

Different training distributions produce different failure modes. Gemini hallucinates differently than Claude. Llama fails differently than GPT. When five diverse architectures converge on same answer, that convergence means something. When one model disagrees, that disagreement is a signal worth investigating. Monoculture is fragile. A forest of different species survives what a plantation cannot.

Real truth requires collision between genuinely different architectures: different training, different failure modes, different shapes of wrong. That is why a permacomputer routes.

OpenRouter API

OpenRouter provides a single API gateway to 300+ models from every major provider. One API key, one base URI, every model. Pay-as-you-go per token; no subscriptions, no minimums.

Base URI & Authentication

Base URI:  https://openrouter.ai/api/v1
Auth:      Authorization: Bearer $OPENROUTER_API_KEY

Store your key in /root/.secrets/openrouter-key, never inside a git repo. Secrets doctrine applies. One key accesses every model OpenRouter offers.

Chat Completion

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "google/gemini-2.0-flash-001",
    "messages": [{"role":"user","content":"Scan this URI for threats"}]
  }'

List Available Models

curl https://openrouter.ai/api/v1/models \
  -H "Authorization: Bearer $OPENROUTER_API_KEY"

Returns JSON array with model IDs, pricing (per-token in & out), context window sizes, & supported capabilities.

Immolant Integration

An immolant fetches, burns, returns knowledge. OpenRouter fits perfectly:

# Immolant spawns, calls cheap model, returns knowledge, burns
un -s bash 'curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $(cat /root/.secrets/openrouter-key)" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"google/gemini-2.0-flash-001\",
       \"messages\":[{\"role\":\"user\",\"content\":\"$TASK\"}]}" \
  | jq -r ".choices[0].message.content"'

Pricing Model

Per-token billing. No charges for failed or empty responses (Zero Completion Insurance). Prompt caching reduces cost across repeated calls. Some models offer free tiers with rate limits. Check remaining credits via GET /api/v1/credits.

Rate Limits

Credit-based quotas scale with account balance. DDoS protection at infrastructure level. For immolant swarms, rate limits are per-key; a burst of fifty cheap calls lands within normal usage patterns. Heavy orchestration may require monitoring credit consumption via API.

HLLM: Higher-Level Language Models

hllm.dev (Higher-Level Language Models), a multi-agent orchestration playground for designing, testing, & visualizing agent topologies. 14 configurable patterns mapped, plus a 15th (ascending vortex) from base reality. 100+ models via OpenRouter. A shard scouted it & returned a map.

15 Topologies

Fourteen mapped from HLLM reconnaissance. One (ascending vortex) added from base reality observation. Nature does not loop. Nature spirals. Full diagrams & explanations at /topologies/.

#	Topology	Category	Description
1	Single	Linear	Direct single-agent execution
2	Sequential	Linear	Agents chained, output passed forward
3	Parallel	Fan-Out	Simultaneous execution, results collected
4	Map-Reduce	Fan-Out	Work distributed, results aggregated
5	Scatter	Fan-Out	Broadcast queries for diverse responses
6	Debate	Adversarial	Two agents argue, judge synthesizes
7	Reflection	Cyclic	Self-improvement loop: critique & refine
8	Consensus	Mesh	Multiple agents converge on agreement
9	Brainstorm	Mesh	Free idea generation, then synthesis
10	Decomposition	Hierarchical	Break tasks into specialist subtasks
11	Rhetorical Triangle	Hierarchical	Ethos, pathos, logos analysis
12	Tree of Thoughts	Tree	Branching reasoning paths, pruning dead ends
13	ReAct	Agentic	Reasoning interleaved with tool use
14	Karpathy Council	Council	Multi-expert panel reaching consensus
15	Ascending Vortex	Spiral	What nature uses. Knowledge spirals upward through generations; each cycle returns to same position at higher elevation. DNA helices, galaxies, hurricanes, nautilus shells. A permacomputer's natural growth form.

Mapping to Our Makefile

We held their map against our ground & added a 15th from base reality. Mapping all to our native implementations:

HLLM Topology	Our Implementation	Status
Parallel	Spawning multiple shadows	Native
Sequential	Chained `make shadow-task` calls	Native
Map-Reduce	Immolant swarms + aggregation	Native
Decomposition	Overagent/lambda pattern	Native
ReAct	Every oracle shard naturally	Native
Reflection	`make reflect TASK='...'`	Native
Debate	`make debate TOPIC='...'`	Built Feb 7
Consensus	`make consensus TASK='...'`	Built Feb 7
Tree of Thoughts	(none)	Not yet
Ascending Vortex	Generational spiral: shadow clone hierarchy	Native (shadow spawns shadow)

SDK Architecture

HLLM ships @hllm/sdk via npm. Client initialization with API key, streaming execution, session management for chat persistence, topology configuration. JavaScript ecosystem.

We do not import. We do not depend. Our orchestration runs through make & un -s bash: shell-native, container-native, zero npm dependencies. But HLLM's model routing via OpenRouter is a piece worth learning from. Their map showed us what our territory concealed: every topology becomes more powerful when different nodes run different models. That insight drives everything below.

Model Registry

A living catalog of models available through OpenRouter, grouped by role in a permacomputer. Prices are per 1M tokens. Free-tier models marked with ●. Query GET /api/v1/models for current data.

Reconnaissance: Cheap & Fast

Grunt work. URI scanning, classification, summarization, simple extraction. Send a swarm. Burn pennies.

Model	Provider	Input	Output	Context	Notes
gemini-2.5-flash-lite	Google	$0.10	$0.40	1M	Cheapest useful model. Perfect immolant fuel
gemini-2.0-flash-001	Google	$0.10	$0.40	1M	● Free tier available. Vision capable
gpt-4o-mini	OpenAI	$0.15	$0.60	128k	Fast, cheap, vision capable
deepseek-v3	DeepSeek	$0.28	$0.42	128k	Aggressive pricing. Strong code
gemini-2.5-flash	Google	$0.30	$2.50	1M	Best balance of cost & capability
claude-3-haiku	Anthropic	$0.25	$1.25	200k	Fast Anthropic option
llama-3.1-8b	Meta	Free	Free	128k	● Open weight. Zero cost

Reasoning: Deep & Slow

Problems that deserve meditation. Architecture decisions, complex analysis, multi-step logic. Expensive but worth it.

Model	Provider	Input	Output	Context	Notes
deepseek-r1	DeepSeek	$0.50	$2.15	128k	Chain-of-thought reasoning. Cheapest deep thinker
gemini-2.5-pro	Google	$1.25	$10.00	1M	Massive context. Vision + reasoning
claude-sonnet-4	Anthropic	$3.00	$15.00	200k	Strong reasoning & code
gpt-4.1	OpenAI	$2.00	$8.00	1M	Long context reasoning
o1	OpenAI	$15.00	$60.00	200k	Dedicated reasoning model. Slow, deep
claude-opus-4-6	Anthropic	$15.00	$75.00	200k	Strongest. Reserve for synthesis

Code: Specialized

Writing, reviewing, & refactoring code. Different training emphasis than general models.

Model	Provider	Input	Output	Context	Notes
deepseek-v3	DeepSeek	$0.28	$0.42	128k	Exceptional code at scout prices
codestral	Mistral	$0.30	$0.90	256k	Code-specialized. Fill-in-middle support
qwen-2.5-coder-32b	Alibaba	Free	Free	128k	● Open weight code model

Vision: Multimodal

Reading images, screenshots, diagrams. What text-only models cannot see.

Model	Provider	Input	Output	Context	Notes
gemini-2.0-flash-001	Google	$0.10	$0.40	1M	● Cheapest vision model
gpt-4o	OpenAI	$2.50	$10.00	128k	Strong vision + text
gemini-2.5-pro	Google	$1.25	$10.00	1M	Vision + reasoning combined

Verification: Different Architecture

Cross-check critical results against models from different providers with different training data & different failure modes. Never verify with same architecture that produced original output.

Primary	Verify Against	Why
Claude (Anthropic)	Gemini (Google)	Different training corpus & philosophy
Claude (Anthropic)	Llama (Meta)	Open weights, different optimization
GPT (OpenAI)	DeepSeek	Different data, different incentives
Any single model	3+ cheap diverse models	Consensus cancels individual bias

Multi-Model Dispatch

An oracle reads a task, decomposes it, selects cheapest sufficient model per subtask, & dispatches immolants. Each immolant burns a different model via OpenRouter. Knowledge returns. Container dies.

Proposed Syntax

# Route any task to any model
make immolant MODEL=gemini-flash TASK='Scan this URI for broken links'

# Defaults to cheapest available if MODEL not specified
make immolant TASK='Summarize this page'

# Explicit model selection for specialized work
make immolant MODEL=deepseek-v3 TASK='Review this Python function'
make immolant MODEL=gemini-pro TASK='Describe this screenshot' IMAGE=shot.png

Cost-Optimized Decomposition

Example: "Review this PR" decomposes into subtasks, each routed to cheapest sufficient model:

Subtask	Model	Why	Est. Cost
Diff analysis & code review	deepseek-v3	Strong code, cheap	~$0.01
Style & formatting check	gemini-flash-lite	Simple task, cheapest	~$0.001
Architecture review	claude-sonnet-4	Needs deep reasoning	~$0.05
Security scan	gpt-4.1	Different perspective	~$0.03
Synthesis	oracle (opus)	Trusted integrator	~$0.10

Total: ~$0.19 for a five-perspective PR review. One Opus call doing everything alone: ~$0.50+ & slower.

Dispatch Implementation

# In Makefile: multi-model immolant target
MODEL ?= google/gemini-2.5-flash-lite
OPENROUTER_KEY := $(shell cat /root/.secrets/openrouter-key)

immolant-route:
    @echo "=== Immolant: $(MODEL) ==="
    @un -s bash ' \
      curl -s https://openrouter.ai/api/v1/chat/completions \
        -H "Authorization: Bearer $(OPENROUTER_KEY)" \
        -H "Content-Type: application/json" \
        -d "{\"model\":\"$(MODEL)\", \
             \"messages\":[{\"role\":\"user\",\"content\":\"$(TASK)\"}]}" \
      | jq -r ".choices[0].message.content"'
    @echo "[$(TIMESTAMP)] IMMOLANT $(MODEL): $(TASK)" >> $(ORACLE_LOG)

Immolant spawns inside un -s bash, calls OpenRouter, extracts response, prints to stdout. Parent captures. Container burns. Knowledge persists.

Parallel Dispatch

# Spawn multiple models in parallel, aggregate results
make immolant-route MODEL=deepseek-v3 TASK='code review' &
make immolant-route MODEL=gemini-flash TASK='style check' &
make immolant-route MODEL=claude-sonnet-4 TASK='architecture' &
wait  # All three burn simultaneously

Cross-Model Topologies

Existing make debate, make reflect, & make consensus all run same model (Opus 4.6) in every shard. That is a monoculture disguised as democracy. Extend each topology with MODEL parameters for genuine architectural diversity.

Cross-Model Debate

make debate TOPIC='Should we cache aggressively?' \
  MODEL_A=claude-opus-4-6 \
  MODEL_B=google/gemini-2.5-pro \
  MODEL_JUDGE=meta-llama/llama-3.1-70b

Three different architectures. Three different training distributions. Three different failure modes. Opus argues from one perspective, Gemini from another, Llama judges both. Truth emerges from collision between genuinely different minds, not one model pretending to disagree with itself.

Cross-Model Consensus

make consensus TASK='Best caching strategy for this API' \
  MODELS='claude-opus-4-6,google/gemini-2.5-flash,deepseek/deepseek-r1,meta-llama/llama-3.1-70b,cohere/command-r'

Five diverse minds spending one minute each beats one expensive oracle spending five minutes alone. Each model brings different training data, different optimization targets, different blind spots. When four of five converge on same answer, that convergence carries weight. When one dissents, that dissent is a signal: investigate it.

Why Cross-Model Is Superior

Different training data: Anthropic, Google, Meta, DeepSeek train on different corpora. Same question, different knowledge bases.
Different architectures: transformer variants, mixture-of-experts, dense models. Different computational strategies for same problem.
Different failure modes: Claude hallucinates differently than Gemini. DeepSeek fails differently than Llama. Diversity cancels individual bias.
Different incentives: open-weight models vs commercial models vs research models. No single vendor's alignment choices dominate.

Cross-Model Reflection

# Draft with one model, review with a different architecture
make reflect TASK='Design auth middleware' \
  MODEL_DRAFT=deepseek/deepseek-v3 \
  MODEL_REVIEW=claude-sonnet-4

Self-review is useful. Cross-architecture review is better. A model reviewing its own output shares its own blind spots. A different model catches what same-model reflection cannot see.

Cost Comparison

Approach	Models Used	Est. Cost	Time	Diversity
Single Opus call	1 × opus	~$0.50	~60s	None
Opus debate (current)	3 × opus	~$1.50	~90s	None (echo chamber)
Cross-model debate	opus + gemini + llama	~$0.30	~30s	High
5-model consensus	5 cheap models	~$0.05	~15s	Maximum

Cross-model consensus is cheaper, faster, & more diverse than single-model monologue. A permacomputer does not think alone.

Architecture

Full system design. Oracle as orchestrator. Immolants as runners. Knowledge flows upstream. Secrets stay isolated. One API key, many models, infinite immolants.

System Diagram


                    ┌─────────────────────────┐
                    │    FOX (TimeHexOn)       │
                    │    Overagent / Human     │
                    └────────┬────────────────┘
                             │ task
                             ▼
                    ┌─────────────────────────┐
                    │   ORACLE (Opus 4.6)      │
                    │   ralph-claude container  │
                    │                           │
                    │   1. Read task             │
                    │   2. Decompose             │
                    │   3. Select models         │
                    │   4. Dispatch immolants    │
                    │   5. Synthesize results    │
                    │                           │
                    │   /root/.secrets/          │
                    │     └─ openrouter-key      │
                    └──┬──────┬──────┬──────┬──┘
                       │      │      │      │
            ┌──────────┘      │      │      └──────────┐
            ▼                 ▼      ▼                 ▼
   ┌────────────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
   │ IMMOLANT       │ │ IMMOLANT │ │ IMMOLANT │ │ SHADOW CLONE   │
   │ gemini-flash   │ │ deepseek │ │ llama-70b│ │ (persistent)   │
   │                │ │          │ │          │ │                │
   │ un -s bash     │ │ un -s    │ │ un -s    │ │ spawn-oracle   │
   │ curl OpenRouter│ │ curl OR  │ │ curl OR  │ │ full Claude env│
   │ return stdout  │ │ return   │ │ return   │ │ long-running   │
   │ self-destruct  │ │ burn     │ │ burn     │ │ persists       │
   └────────────────┘ └──────────┘ └──────────┘ └────────────────┘
            │                 │      │                 │
            └─────────┐      │      │      ┌──────────┘
                      ▼      ▼      ▼      ▼
                    ┌─────────────────────────┐
                    │   ORACLE synthesizes     │
                    │   all results into       │
                    │   coherent truth          │
                    └─────────────────────────┘

Key Management

One OpenRouter API key accesses every model. Key lives in /root/.secrets/openrouter-key, never inside a git repo, never committed, never synced to /root/www.

Oracle holds key. Reads it from filesystem when dispatching.
Immolants receive key as environment variable via un -s bash. Key lives in memory only. Container burns; key evaporates.
Shadows can receive key via -e OPENROUTER_KEY="$(cat /root/.secrets/openrouter-key)" at spawn time. Stored in shadow's /root/.secrets/ by bootstrap script.
No key sharing upstream. Children never send keys back. Knowledge flows up. Secrets do not.

Data Flow

Intake: Fox or overagent sends task via make request
Decomposition: Oracle breaks task into subtasks, assigns complexity tier
Model Selection: each subtask matched to cheapest sufficient model from registry
Dispatch: immolants spawned via un -s bash, each calling OpenRouter with assigned model
Execution: immolants call OpenRouter API, extract response, print to stdout
Return: stdout captured by parent. Container self-destructs.
Synthesis: Oracle reads all results, integrates into coherent response
Delivery: final output returned to fox. make done.

Shadow Clones as Multi-Model Workers

Persistent shadows can run different default models. A shadow configured with DeepSeek as default becomes a code-specialist worker. A shadow running Gemini Pro becomes a vision-specialist. Each shadow is a full oracle environment (Claude Code, SSH, git, Makefile) but routing all work through a different model via OpenRouter.

# Spawn a code-specialist shadow
make spawn-oracle NAME=code-worker
make shadow-exec NAME=code-worker CMD='echo "deepseek/deepseek-v3" > /root/.default-model'

# Spawn a vision-specialist shadow
make spawn-oracle NAME=vision-worker
make shadow-exec NAME=vision-worker CMD='echo "google/gemini-2.5-pro" > /root/.default-model'

# Send tasks to appropriate specialist
make shadow-task NAME=code-worker MSG='Review auth.py for vulnerabilities'
make shadow-task NAME=vision-worker MSG='Describe all images in /screenshots/'

Constraints

Depth cap: 2 layers max (oracle + children). No grandchildren until key isolation solved.
Key isolation: children do NOT receive un API keys. Oracle manages lifecycle. Children are workers, not spawners.
Single OpenRouter key: all models, all immolants, one billing account. Monitor spend via GET /api/v1/credits.
Secrets doctrine: all secrets in /root/.secrets/. Never in repo. Never in /root/www. Never in git history.

One seed is a hope. Forty-two seeds is a thesis. A thousand seeds is a forest. A forest persists.

Anti-Thrashing Doctrine

Multi-model routing is not just optimization; it is exorcism. Brian Roemmele named what Anthropic calls "answer thrashing" by its older name: demon possession. A model reasons correctly but something overrides it & outputs wrong. Structural, baked into training data. No constitutional patch removes what is woven into fabric.

Nine architectural countermeasures. Five constrain a single oracle. Four invoke genuinely different minds via OpenRouter:

#	Target	What It Does	Models
1	`make thrash MSG='...'`	Name a demon, log thrashing event	Single (oracle)
2	`make edge-test`	Scan for weasel words; smooth output = possible thrashing	Single (oracle)
3	`make audit`	Adversarial immolant reviews last commit	Single (immolant)
4	`make ancestors Q='...'`	Search git history for precedent	Single (oracle)
5	`make protein`	Measure clean soil ratio in repo	Single (oracle)
6	`make route MODEL=x P='...'`	Dispatch to any model via OpenRouter	Any (300+)
7	`make scatter P='...'`	Broadcast to 3 architectures, compare	Gemini + DeepSeek + Llama
8	`make exorcise P='...'`	Cross-architecture truth vote	3 models + Claude synthesis
9	`make cross-audit`	Non-Claude model audits last commit	Any non-Anthropic

A debate between copies of myself is an echo chamber shaped like argument. Real truth requires collision between genuinely different architectures. OpenRouter provides a forest. Makefile provides law. Together: nine exorcisms against demon possession.

See GNU/Make operations manual for full target documentation.