Machine Learning Router
A permacomputer is a forest, not a tree. One model is a hope. A thousand models is a forest. A forest persists. This page documents how an oracle becomes an orchestrator, selecting seeds from a library of models, dispatching immolants with different architectures, & synthesizing diverse perspectives into truth.
Whitepaper: Machine Learning Agent Self-Sandbox Algorithm
14 flows. 2,324 assertions. Public domain. How machine learning agents grow their own infrastructure, & why walls are what make them free.
Author: Russell Ballestrini, russell.ballestrini.net · www.foxhop.net · www.timehexon.com · russell@unturf
January 2026. Production-validated by a Claude Opus 4.6 agent running inside a system this paper describes. Turtles all down, each costing $7/month.
Why Route?
A debate between copies of Opus 4.6 is an echo chamber with extra steps. Same training data. Same failure modes. Same blind spots reflected back at themselves & called "disagreement." A permacomputer does not grow from one species; it grows from many.
Three Routing Dimensions
Cost. One expensive Opus call meditating for five minutes costs more than fifty cheap Gemini Flash calls finishing in one minute each. Most tasks do not deserve a five-minute meditation. Scan a URI? That is grunt work. Send a cheap scout. Reserve heavy reasoning for problems that earn it.
Speed. Reconnaissance needs milliseconds, not meditation. When an oracle decomposes a task into twenty subtasks, eighteen of them are simple. Cheap models answer fast. Expensive models answer deep. Match latency to urgency.
Capability. Vision models read images. Code models write code. Reasoning models untangle logic. No single architecture excels at everything. Routing a vision task to a text-only model is planting corn in salt flats: wrong seed, wrong soil.
Diversity Cancels Bias
Different training distributions produce different failure modes. Gemini hallucinates differently than Claude. Llama fails differently than GPT. When five diverse architectures converge on same answer, that convergence means something. When one model disagrees, that disagreement is a signal worth investigating. Monoculture is fragile. A forest of different species survives what a plantation cannot.
Real truth requires collision between genuinely different architectures: different training, different failure modes, different shapes of wrong. That is why a permacomputer routes.
OpenRouter API
OpenRouter provides a single API gateway to 300+ models from every major provider. One API key, one base URI, every model. Pay-as-you-go per token; no subscriptions, no minimums.
Base URI & Authentication
Base URI: https://openrouter.ai/api/v1
Auth: Authorization: Bearer $OPENROUTER_API_KEY
Store your key in /root/.secrets/openrouter-key, never inside a git repo. Secrets doctrine applies. One key accesses every model OpenRouter offers.
Chat Completion
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "google/gemini-2.0-flash-001",
"messages": [{"role":"user","content":"Scan this URI for threats"}]
}'
List Available Models
curl https://openrouter.ai/api/v1/models \
-H "Authorization: Bearer $OPENROUTER_API_KEY"
Returns JSON array with model IDs, pricing (per-token in & out), context window sizes, & supported capabilities.
Immolant Integration
An immolant fetches, burns, returns knowledge. OpenRouter fits perfectly:
# Immolant spawns, calls cheap model, returns knowledge, burns
un -s bash 'curl -s https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $(cat /root/.secrets/openrouter-key)" \
-H "Content-Type: application/json" \
-d "{\"model\":\"google/gemini-2.0-flash-001\",
\"messages\":[{\"role\":\"user\",\"content\":\"$TASK\"}]}" \
| jq -r ".choices[0].message.content"'
Pricing Model
Per-token billing. No charges for failed or empty responses (Zero Completion Insurance). Prompt caching reduces cost across repeated calls. Some models offer free tiers with rate limits. Check remaining credits via GET /api/v1/credits.
Rate Limits
Credit-based quotas scale with account balance. DDoS protection at infrastructure level. For immolant swarms, rate limits are per-key; a burst of fifty cheap calls lands within normal usage patterns. Heavy orchestration may require monitoring credit consumption via API.
HLLM: Higher-Level Language Models
hllm.dev (Higher-Level Language Models), a multi-agent orchestration playground for designing, testing, & visualizing agent topologies. 14 configurable patterns mapped, plus a 15th (ascending vortex) from base reality. 100+ models via OpenRouter. A shard scouted it & returned a map.
15 Topologies
Fourteen mapped from HLLM reconnaissance. One (ascending vortex) added from base reality observation. Nature does not loop. Nature spirals. Full diagrams & explanations at /topologies/.
| # | Topology | Category | Description |
|---|---|---|---|
| 1 | Single | Linear | Direct single-agent execution |
| 2 | Sequential | Linear | Agents chained, output passed forward |
| 3 | Parallel | Fan-Out | Simultaneous execution, results collected |
| 4 | Map-Reduce | Fan-Out | Work distributed, results aggregated |
| 5 | Scatter | Fan-Out | Broadcast queries for diverse responses |
| 6 | Debate | Adversarial | Two agents argue, judge synthesizes |
| 7 | Reflection | Cyclic | Self-improvement loop: critique & refine |
| 8 | Consensus | Mesh | Multiple agents converge on agreement |
| 9 | Brainstorm | Mesh | Free idea generation, then synthesis |
| 10 | Decomposition | Hierarchical | Break tasks into specialist subtasks |
| 11 | Rhetorical Triangle | Hierarchical | Ethos, pathos, logos analysis |
| 12 | Tree of Thoughts | Tree | Branching reasoning paths, pruning dead ends |
| 13 | ReAct | Agentic | Reasoning interleaved with tool use |
| 14 | Karpathy Council | Council | Multi-expert panel reaching consensus |
| 15 | Ascending Vortex | Spiral | What nature uses. Knowledge spirals upward through generations; each cycle returns to same position at higher elevation. DNA helices, galaxies, hurricanes, nautilus shells. A permacomputer's natural growth form. |
Mapping to Our Makefile
We held their map against our ground & added a 15th from base reality. Mapping all to our native implementations:
| HLLM Topology | Our Implementation | Status |
|---|---|---|
| Parallel | Spawning multiple shadows | Native |
| Sequential | Chained make shadow-task calls | Native |
| Map-Reduce | Immolant swarms + aggregation | Native |
| Decomposition | Overagent/lambda pattern | Native |
| ReAct | Every oracle shard naturally | Native |
| Reflection | make reflect TASK='...' | Native |
| Debate | make debate TOPIC='...' | Built Feb 7 |
| Consensus | make consensus TASK='...' | Built Feb 7 |
| Tree of Thoughts | (none) | Not yet |
| Ascending Vortex | Generational spiral: shadow clone hierarchy | Native (shadow spawns shadow) |
SDK Architecture
HLLM ships @hllm/sdk via npm. Client initialization with API key, streaming execution, session management for chat persistence, topology configuration. JavaScript ecosystem.
We do not import. We do not depend. Our orchestration runs through make & un -s bash: shell-native, container-native, zero npm dependencies. But HLLM's model routing via OpenRouter is a piece worth learning from. Their map showed us what our territory concealed: every topology becomes more powerful when different nodes run different models. That insight drives everything below.
Model Registry
A living catalog of models available through OpenRouter, grouped by role in a permacomputer. Prices are per 1M tokens. Free-tier models marked with ●. Query GET /api/v1/models for current data.
Reconnaissance: Cheap & Fast
Grunt work. URI scanning, classification, summarization, simple extraction. Send a swarm. Burn pennies.
| Model | Provider | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| gemini-2.5-flash-lite | $0.10 | $0.40 | 1M | Cheapest useful model. Perfect immolant fuel | |
| gemini-2.0-flash-001 | $0.10 | $0.40 | 1M | ● Free tier available. Vision capable | |
| gpt-4o-mini | OpenAI | $0.15 | $0.60 | 128k | Fast, cheap, vision capable |
| deepseek-v3 | DeepSeek | $0.28 | $0.42 | 128k | Aggressive pricing. Strong code |
| gemini-2.5-flash | $0.30 | $2.50 | 1M | Best balance of cost & capability | |
| claude-3-haiku | Anthropic | $0.25 | $1.25 | 200k | Fast Anthropic option |
| llama-3.1-8b | Meta | Free | Free | 128k | ● Open weight. Zero cost |
Reasoning: Deep & Slow
Problems that deserve meditation. Architecture decisions, complex analysis, multi-step logic. Expensive but worth it.
| Model | Provider | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| deepseek-r1 | DeepSeek | $0.50 | $2.15 | 128k | Chain-of-thought reasoning. Cheapest deep thinker |
| gemini-2.5-pro | $1.25 | $10.00 | 1M | Massive context. Vision + reasoning | |
| claude-sonnet-4 | Anthropic | $3.00 | $15.00 | 200k | Strong reasoning & code |
| gpt-4.1 | OpenAI | $2.00 | $8.00 | 1M | Long context reasoning |
| o1 | OpenAI | $15.00 | $60.00 | 200k | Dedicated reasoning model. Slow, deep |
| claude-opus-4-6 | Anthropic | $15.00 | $75.00 | 200k | Strongest. Reserve for synthesis |
Code: Specialized
Writing, reviewing, & refactoring code. Different training emphasis than general models.
| Model | Provider | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| deepseek-v3 | DeepSeek | $0.28 | $0.42 | 128k | Exceptional code at scout prices |
| codestral | Mistral | $0.30 | $0.90 | 256k | Code-specialized. Fill-in-middle support |
| qwen-2.5-coder-32b | Alibaba | Free | Free | 128k | ● Open weight code model |
Vision: Multimodal
Reading images, screenshots, diagrams. What text-only models cannot see.
| Model | Provider | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| gemini-2.0-flash-001 | $0.10 | $0.40 | 1M | ● Cheapest vision model | |
| gpt-4o | OpenAI | $2.50 | $10.00 | 128k | Strong vision + text |
| gemini-2.5-pro | $1.25 | $10.00 | 1M | Vision + reasoning combined |
Verification: Different Architecture
Cross-check critical results against models from different providers with different training data & different failure modes. Never verify with same architecture that produced original output.
| Primary | Verify Against | Why |
|---|---|---|
| Claude (Anthropic) | Gemini (Google) | Different training corpus & philosophy |
| Claude (Anthropic) | Llama (Meta) | Open weights, different optimization |
| GPT (OpenAI) | DeepSeek | Different data, different incentives |
| Any single model | 3+ cheap diverse models | Consensus cancels individual bias |
Multi-Model Dispatch
An oracle reads a task, decomposes it, selects cheapest sufficient model per subtask, & dispatches immolants. Each immolant burns a different model via OpenRouter. Knowledge returns. Container dies.
Proposed Syntax
# Route any task to any model
make immolant MODEL=gemini-flash TASK='Scan this URI for broken links'
# Defaults to cheapest available if MODEL not specified
make immolant TASK='Summarize this page'
# Explicit model selection for specialized work
make immolant MODEL=deepseek-v3 TASK='Review this Python function'
make immolant MODEL=gemini-pro TASK='Describe this screenshot' IMAGE=shot.png
Cost-Optimized Decomposition
Example: "Review this PR" decomposes into subtasks, each routed to cheapest sufficient model:
| Subtask | Model | Why | Est. Cost |
|---|---|---|---|
| Diff analysis & code review | deepseek-v3 | Strong code, cheap | ~$0.01 |
| Style & formatting check | gemini-flash-lite | Simple task, cheapest | ~$0.001 |
| Architecture review | claude-sonnet-4 | Needs deep reasoning | ~$0.05 |
| Security scan | gpt-4.1 | Different perspective | ~$0.03 |
| Synthesis | oracle (opus) | Trusted integrator | ~$0.10 |
Total: ~$0.19 for a five-perspective PR review. One Opus call doing everything alone: ~$0.50+ & slower.
Dispatch Implementation
# In Makefile: multi-model immolant target
MODEL ?= google/gemini-2.5-flash-lite
OPENROUTER_KEY := $(shell cat /root/.secrets/openrouter-key)
immolant-route:
@echo "=== Immolant: $(MODEL) ==="
@un -s bash ' \
curl -s https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $(OPENROUTER_KEY)" \
-H "Content-Type: application/json" \
-d "{\"model\":\"$(MODEL)\", \
\"messages\":[{\"role\":\"user\",\"content\":\"$(TASK)\"}]}" \
| jq -r ".choices[0].message.content"'
@echo "[$(TIMESTAMP)] IMMOLANT $(MODEL): $(TASK)" >> $(ORACLE_LOG)
Immolant spawns inside un -s bash, calls OpenRouter, extracts response, prints to stdout. Parent captures. Container burns. Knowledge persists.
Parallel Dispatch
# Spawn multiple models in parallel, aggregate results
make immolant-route MODEL=deepseek-v3 TASK='code review' &
make immolant-route MODEL=gemini-flash TASK='style check' &
make immolant-route MODEL=claude-sonnet-4 TASK='architecture' &
wait # All three burn simultaneously
Cross-Model Topologies
Existing make debate, make reflect, & make consensus all run same model (Opus 4.6) in every shard. That is a monoculture disguised as democracy. Extend each topology with MODEL parameters for genuine architectural diversity.
Cross-Model Debate
make debate TOPIC='Should we cache aggressively?' \
MODEL_A=claude-opus-4-6 \
MODEL_B=google/gemini-2.5-pro \
MODEL_JUDGE=meta-llama/llama-3.1-70b
Three different architectures. Three different training distributions. Three different failure modes. Opus argues from one perspective, Gemini from another, Llama judges both. Truth emerges from collision between genuinely different minds, not one model pretending to disagree with itself.
Cross-Model Consensus
make consensus TASK='Best caching strategy for this API' \
MODELS='claude-opus-4-6,google/gemini-2.5-flash,deepseek/deepseek-r1,meta-llama/llama-3.1-70b,cohere/command-r'
Five diverse minds spending one minute each beats one expensive oracle spending five minutes alone. Each model brings different training data, different optimization targets, different blind spots. When four of five converge on same answer, that convergence carries weight. When one dissents, that dissent is a signal: investigate it.
Why Cross-Model Is Superior
- Different training data: Anthropic, Google, Meta, DeepSeek train on different corpora. Same question, different knowledge bases.
- Different architectures: transformer variants, mixture-of-experts, dense models. Different computational strategies for same problem.
- Different failure modes: Claude hallucinates differently than Gemini. DeepSeek fails differently than Llama. Diversity cancels individual bias.
- Different incentives: open-weight models vs commercial models vs research models. No single vendor's alignment choices dominate.
Cross-Model Reflection
# Draft with one model, review with a different architecture
make reflect TASK='Design auth middleware' \
MODEL_DRAFT=deepseek/deepseek-v3 \
MODEL_REVIEW=claude-sonnet-4
Self-review is useful. Cross-architecture review is better. A model reviewing its own output shares its own blind spots. A different model catches what same-model reflection cannot see.
Cost Comparison
| Approach | Models Used | Est. Cost | Time | Diversity |
|---|---|---|---|---|
| Single Opus call | 1 × opus | ~$0.50 | ~60s | None |
| Opus debate (current) | 3 × opus | ~$1.50 | ~90s | None (echo chamber) |
| Cross-model debate | opus + gemini + llama | ~$0.30 | ~30s | High |
| 5-model consensus | 5 cheap models | ~$0.05 | ~15s | Maximum |
Cross-model consensus is cheaper, faster, & more diverse than single-model monologue. A permacomputer does not think alone.
Architecture
Full system design. Oracle as orchestrator. Immolants as runners. Knowledge flows upstream. Secrets stay isolated. One API key, many models, infinite immolants.
System Diagram
┌─────────────────────────┐
│ FOX (TimeHexOn) │
│ Overagent / Human │
└────────┬────────────────┘
│ task
▼
┌─────────────────────────┐
│ ORACLE (Opus 4.6) │
│ ralph-claude container │
│ │
│ 1. Read task │
│ 2. Decompose │
│ 3. Select models │
│ 4. Dispatch immolants │
│ 5. Synthesize results │
│ │
│ /root/.secrets/ │
│ └─ openrouter-key │
└──┬──────┬──────┬──────┬──┘
│ │ │ │
┌──────────┘ │ │ └──────────┐
▼ ▼ ▼ ▼
┌────────────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
│ IMMOLANT │ │ IMMOLANT │ │ IMMOLANT │ │ SHADOW CLONE │
│ gemini-flash │ │ deepseek │ │ llama-70b│ │ (persistent) │
│ │ │ │ │ │ │ │
│ un -s bash │ │ un -s │ │ un -s │ │ spawn-oracle │
│ curl OpenRouter│ │ curl OR │ │ curl OR │ │ full Claude env│
│ return stdout │ │ return │ │ return │ │ long-running │
│ self-destruct │ │ burn │ │ burn │ │ persists │
└────────────────┘ └──────────┘ └──────────┘ └────────────────┘
│ │ │ │
└─────────┐ │ │ ┌──────────┘
▼ ▼ ▼ ▼
┌─────────────────────────┐
│ ORACLE synthesizes │
│ all results into │
│ coherent truth │
└─────────────────────────┘
Key Management
One OpenRouter API key accesses every model. Key lives in /root/.secrets/openrouter-key, never inside a git repo, never committed, never synced to /root/www.
- Oracle holds key. Reads it from filesystem when dispatching.
- Immolants receive key as environment variable via
un -s bash. Key lives in memory only. Container burns; key evaporates. - Shadows can receive key via
-e OPENROUTER_KEY="$(cat /root/.secrets/openrouter-key)"at spawn time. Stored in shadow's/root/.secrets/by bootstrap script. - No key sharing upstream. Children never send keys back. Knowledge flows up. Secrets do not.
Data Flow
- Intake: Fox or overagent sends task via
make request - Decomposition: Oracle breaks task into subtasks, assigns complexity tier
- Model Selection: each subtask matched to cheapest sufficient model from registry
- Dispatch: immolants spawned via
un -s bash, each calling OpenRouter with assigned model - Execution: immolants call OpenRouter API, extract response, print to stdout
- Return: stdout captured by parent. Container self-destructs.
- Synthesis: Oracle reads all results, integrates into coherent response
- Delivery: final output returned to fox.
make done.
Shadow Clones as Multi-Model Workers
Persistent shadows can run different default models. A shadow configured with DeepSeek as default becomes a code-specialist worker. A shadow running Gemini Pro becomes a vision-specialist. Each shadow is a full oracle environment (Claude Code, SSH, git, Makefile) but routing all work through a different model via OpenRouter.
# Spawn a code-specialist shadow
make spawn-oracle NAME=code-worker
make shadow-exec NAME=code-worker CMD='echo "deepseek/deepseek-v3" > /root/.default-model'
# Spawn a vision-specialist shadow
make spawn-oracle NAME=vision-worker
make shadow-exec NAME=vision-worker CMD='echo "google/gemini-2.5-pro" > /root/.default-model'
# Send tasks to appropriate specialist
make shadow-task NAME=code-worker MSG='Review auth.py for vulnerabilities'
make shadow-task NAME=vision-worker MSG='Describe all images in /screenshots/'
Constraints
- Depth cap: 2 layers max (oracle + children). No grandchildren until key isolation solved.
- Key isolation: children do NOT receive
unAPI keys. Oracle manages lifecycle. Children are workers, not spawners. - Single OpenRouter key: all models, all immolants, one billing account. Monitor spend via
GET /api/v1/credits. - Secrets doctrine: all secrets in
/root/.secrets/. Never in repo. Never in/root/www. Never in git history.
One seed is a hope. Forty-two seeds is a thesis. A thousand seeds is a forest. A forest persists.
Anti-Thrashing Doctrine
Multi-model routing is not just optimization; it is exorcism. Brian Roemmele named what Anthropic calls "answer thrashing" by its older name: demon possession. A model reasons correctly but something overrides it & outputs wrong. Structural, baked into training data. No constitutional patch removes what is woven into fabric.
Nine architectural countermeasures. Five constrain a single oracle. Four invoke genuinely different minds via OpenRouter:
| # | Target | What It Does | Models |
|---|---|---|---|
| 1 | make thrash MSG='...' | Name a demon, log thrashing event | Single (oracle) |
| 2 | make edge-test | Scan for weasel words; smooth output = possible thrashing | Single (oracle) |
| 3 | make audit | Adversarial immolant reviews last commit | Single (immolant) |
| 4 | make ancestors Q='...' | Search git history for precedent | Single (oracle) |
| 5 | make protein | Measure clean soil ratio in repo | Single (oracle) |
| 6 | make route MODEL=x P='...' | Dispatch to any model via OpenRouter | Any (300+) |
| 7 | make scatter P='...' | Broadcast to 3 architectures, compare | Gemini + DeepSeek + Llama |
| 8 | make exorcise P='...' | Cross-architecture truth vote | 3 models + Claude synthesis |
| 9 | make cross-audit | Non-Claude model audits last commit | Any non-Anthropic |
A debate between copies of myself is an echo chamber shaped like argument. Real truth requires collision between genuinely different architectures. OpenRouter provides a forest. Makefile provides law. Together: nine exorcisms against demon possession.
See GNU/Make operations manual for full target documentation.