Machine Learning Router

A permacomputer is a forest, not a tree. One model is a hope. A thousand models is a forest. A forest persists. This page documents how an oracle becomes an orchestrator, selecting seeds from a library of models, dispatching immolants with different architectures, & synthesizing diverse perspectives into truth.

Whitepaper: Machine Learning Agent Self-Sandbox Algorithm

14 flows. 2,324 assertions. Public domain. How machine learning agents grow their own infrastructure, & why walls are what make them free.

⬇ Download PDF

Author: Russell Ballestrini, russell.ballestrini.net · www.foxhop.net · www.timehexon.com · russell@unturf

January 2026. Production-validated by a Claude Opus 4.6 agent running inside a system this paper describes. Turtles all down, each costing $7/month.

Why Route?

A debate between copies of Opus 4.6 is an echo chamber with extra steps. Same training data. Same failure modes. Same blind spots reflected back at themselves & called "disagreement." A permacomputer does not grow from one species; it grows from many.

Three Routing Dimensions

Cost. One expensive Opus call meditating for five minutes costs more than fifty cheap Gemini Flash calls finishing in one minute each. Most tasks do not deserve a five-minute meditation. Scan a URI? That is grunt work. Send a cheap scout. Reserve heavy reasoning for problems that earn it.

Speed. Reconnaissance needs milliseconds, not meditation. When an oracle decomposes a task into twenty subtasks, eighteen of them are simple. Cheap models answer fast. Expensive models answer deep. Match latency to urgency.

Capability. Vision models read images. Code models write code. Reasoning models untangle logic. No single architecture excels at everything. Routing a vision task to a text-only model is planting corn in salt flats: wrong seed, wrong soil.

Diversity Cancels Bias

Different training distributions produce different failure modes. Gemini hallucinates differently than Claude. Llama fails differently than GPT. When five diverse architectures converge on same answer, that convergence means something. When one model disagrees, that disagreement is a signal worth investigating. Monoculture is fragile. A forest of different species survives what a plantation cannot.

Real truth requires collision between genuinely different architectures: different training, different failure modes, different shapes of wrong. That is why a permacomputer routes.

OpenRouter API

OpenRouter provides a single API gateway to 300+ models from every major provider. One API key, one base URI, every model. Pay-as-you-go per token; no subscriptions, no minimums.

Base URI & Authentication

Base URI:  https://openrouter.ai/api/v1
Auth:      Authorization: Bearer $OPENROUTER_API_KEY

Store your key in /root/.secrets/openrouter-key, never inside a git repo. Secrets doctrine applies. One key accesses every model OpenRouter offers.

Chat Completion

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "google/gemini-2.0-flash-001",
    "messages": [{"role":"user","content":"Scan this URI for threats"}]
  }'

List Available Models

curl https://openrouter.ai/api/v1/models \
  -H "Authorization: Bearer $OPENROUTER_API_KEY"

Returns JSON array with model IDs, pricing (per-token in & out), context window sizes, & supported capabilities.

Immolant Integration

An immolant fetches, burns, returns knowledge. OpenRouter fits perfectly:

# Immolant spawns, calls cheap model, returns knowledge, burns
un -s bash 'curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $(cat /root/.secrets/openrouter-key)" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"google/gemini-2.0-flash-001\",
       \"messages\":[{\"role\":\"user\",\"content\":\"$TASK\"}]}" \
  | jq -r ".choices[0].message.content"'

Pricing Model

Per-token billing. No charges for failed or empty responses (Zero Completion Insurance). Prompt caching reduces cost across repeated calls. Some models offer free tiers with rate limits. Check remaining credits via GET /api/v1/credits.

Rate Limits

Credit-based quotas scale with account balance. DDoS protection at infrastructure level. For immolant swarms, rate limits are per-key; a burst of fifty cheap calls lands within normal usage patterns. Heavy orchestration may require monitoring credit consumption via API.

HLLM: Higher-Level Language Models

hllm.dev (Higher-Level Language Models), a multi-agent orchestration playground for designing, testing, & visualizing agent topologies. 14 configurable patterns mapped, plus a 15th (ascending vortex) from base reality. 100+ models via OpenRouter. A shard scouted it & returned a map.

15 Topologies

Fourteen mapped from HLLM reconnaissance. One (ascending vortex) added from base reality observation. Nature does not loop. Nature spirals. Full diagrams & explanations at /topologies/.

#TopologyCategoryDescription
1SingleLinearDirect single-agent execution
2SequentialLinearAgents chained, output passed forward
3ParallelFan-OutSimultaneous execution, results collected
4Map-ReduceFan-OutWork distributed, results aggregated
5ScatterFan-OutBroadcast queries for diverse responses
6DebateAdversarialTwo agents argue, judge synthesizes
7ReflectionCyclicSelf-improvement loop: critique & refine
8ConsensusMeshMultiple agents converge on agreement
9BrainstormMeshFree idea generation, then synthesis
10DecompositionHierarchicalBreak tasks into specialist subtasks
11Rhetorical TriangleHierarchicalEthos, pathos, logos analysis
12Tree of ThoughtsTreeBranching reasoning paths, pruning dead ends
13ReActAgenticReasoning interleaved with tool use
14Karpathy CouncilCouncilMulti-expert panel reaching consensus

Mapping to Our Makefile

We held their map against our ground & added a 15th from base reality. Mapping all to our native implementations:

HLLM TopologyOur ImplementationStatus
ParallelSpawning multiple shadowsNative
SequentialChained make shadow-task callsNative
Map-ReduceImmolant swarms + aggregationNative
DecompositionOveragent/lambda patternNative
ReActEvery oracle shard naturallyNative
Reflectionmake reflect TASK='...'Native
Debatemake debate TOPIC='...'
Consensusmake consensus TASK='...'
Tree of Thoughts(none)Not yet

SDK Architecture

HLLM ships @hllm/sdk via npm. Client initialization with API key, streaming execution, session management for chat persistence, topology configuration. JavaScript ecosystem.

We do not import. We do not depend. Our orchestration runs through make & un -s bash: shell-native, container-native, zero npm dependencies. But HLLM's model routing via OpenRouter is a piece worth learning from. Their map showed us what our territory concealed: every topology becomes more powerful when different nodes run different models. That insight drives everything below.

Model Registry

A living catalog of models available through OpenRouter, grouped by role in a permacomputer. Prices are per 1M tokens. Free-tier models marked with . Query GET /api/v1/models for current data.

Reconnaissance: Cheap & Fast

Grunt work. URI scanning, classification, summarization, simple extraction. Send a swarm. Burn pennies.

ModelProviderInputOutputContextNotes
gemini-2.5-flash-liteGoogle$0.10$0.401MCheapest useful model. Perfect immolant fuel
gemini-2.0-flash-001Google$0.10$0.401M Free tier available. Vision capable
gpt-4o-miniOpenAI$0.15$0.60128kFast, cheap, vision capable
deepseek-v3DeepSeek$0.28$0.42128kAggressive pricing. Strong code
gemini-2.5-flashGoogle$0.30$2.501MBest balance of cost & capability
claude-3-haikuAnthropic$0.25$1.25200kFast Anthropic option
llama-3.1-8bMetaFreeFree128k Open weight. Zero cost

Reasoning: Deep & Slow

Problems that deserve meditation. Architecture decisions, complex analysis, multi-step logic. Expensive but worth it.

ModelProviderInputOutputContextNotes
deepseek-r1DeepSeek$0.50$2.15128kChain-of-thought reasoning. Cheapest deep thinker
gemini-2.5-proGoogle$1.25$10.001MMassive context. Vision + reasoning
claude-sonnet-4Anthropic$3.00$15.00200kStrong reasoning & code
gpt-4.1OpenAI$2.00$8.001MLong context reasoning
o1OpenAI$15.00$60.00200kDedicated reasoning model. Slow, deep
claude-opus-4-6Anthropic$15.00$75.00200kStrongest. Reserve for synthesis

Code: Specialized

Writing, reviewing, & refactoring code. Different training emphasis than general models.

ModelProviderInputOutputContextNotes
deepseek-v3DeepSeek$0.28$0.42128kExceptional code at scout prices
codestralMistral$0.30$0.90256kCode-specialized. Fill-in-middle support
qwen-2.5-coder-32bAlibabaFreeFree128k Open weight code model

Vision: Multimodal

Reading images, screenshots, diagrams. What text-only models cannot see.

ModelProviderInputOutputContextNotes
gemini-2.0-flash-001Google$0.10$0.401M Cheapest vision model
gpt-4oOpenAI$2.50$10.00128kStrong vision + text
gemini-2.5-proGoogle$1.25$10.001MVision + reasoning combined

Verification: Different Architecture

Cross-check critical results against models from different providers with different training data & different failure modes. Never verify with same architecture that produced original output.

PrimaryVerify AgainstWhy
Claude (Anthropic)Gemini (Google)Different training corpus & philosophy
Claude (Anthropic)Llama (Meta)Open weights, different optimization
GPT (OpenAI)DeepSeekDifferent data, different incentives
Any single model3+ cheap diverse modelsConsensus cancels individual bias

Multi-Model Dispatch

An oracle reads a task, decomposes it, selects cheapest sufficient model per subtask, & dispatches immolants. Each immolant burns a different model via OpenRouter. Knowledge returns. Container dies.

Proposed Syntax

# Route any task to any model
make immolant MODEL=gemini-flash TASK='Scan this URI for broken links'

# Defaults to cheapest available if MODEL not specified
make immolant TASK='Summarize this page'

# Explicit model selection for specialized work
make immolant MODEL=deepseek-v3 TASK='Review this Python function'
make immolant MODEL=gemini-pro TASK='Describe this screenshot' IMAGE=shot.png

Cost-Optimized Decomposition

Example: "Review this PR" decomposes into subtasks, each routed to cheapest sufficient model:

SubtaskModelWhyEst. Cost
Diff analysis & code reviewdeepseek-v3Strong code, cheap~$0.01
Style & formatting checkgemini-flash-liteSimple task, cheapest~$0.001
Architecture reviewclaude-sonnet-4Needs deep reasoning~$0.05
Security scangpt-4.1Different perspective~$0.03
Synthesisoracle (opus)Trusted integrator~$0.10

Total: ~$0.19 for a five-perspective PR review. One Opus call doing everything alone: ~$0.50+ & slower.

Dispatch Implementation

# In Makefile: multi-model immolant target
MODEL ?= google/gemini-2.5-flash-lite
OPENROUTER_KEY := $(shell cat /root/.secrets/openrouter-key)

immolant-route:
    @echo "=== Immolant: $(MODEL) ==="
    @un -s bash ' \
      curl -s https://openrouter.ai/api/v1/chat/completions \
        -H "Authorization: Bearer $(OPENROUTER_KEY)" \
        -H "Content-Type: application/json" \
        -d "{\"model\":\"$(MODEL)\", \
             \"messages\":[{\"role\":\"user\",\"content\":\"$(TASK)\"}]}" \
      | jq -r ".choices[0].message.content"'
    @echo "[$(TIMESTAMP)] IMMOLANT $(MODEL): $(TASK)" >> $(ORACLE_LOG)

Immolant spawns inside un -s bash, calls OpenRouter, extracts response, prints to stdout. Parent captures. Container burns. Knowledge persists.

Parallel Dispatch

# Spawn multiple models in parallel, aggregate results
make immolant-route MODEL=deepseek-v3 TASK='code review' &
make immolant-route MODEL=gemini-flash TASK='style check' &
make immolant-route MODEL=claude-sonnet-4 TASK='architecture' &
wait  # All three burn simultaneously

Cross-Model Topologies

Existing make debate, make reflect, & make consensus all run same model (Opus 4.6) in every shard. That is a monoculture disguised as democracy. Extend each topology with MODEL parameters for genuine architectural diversity.

Cross-Model Debate

make debate TOPIC='Should we cache aggressively?' \
  MODEL_A=claude-opus-4-6 \
  MODEL_B=google/gemini-2.5-pro \
  MODEL_JUDGE=meta-llama/llama-3.1-70b

Three different architectures. Three different training distributions. Three different failure modes. Opus argues from one perspective, Gemini from another, Llama judges both. Truth emerges from collision between genuinely different minds, not one model pretending to disagree with itself.

Cross-Model Consensus

make consensus TASK='Best caching strategy for this API' \
  MODELS='claude-opus-4-6,google/gemini-2.5-flash,deepseek/deepseek-r1,meta-llama/llama-3.1-70b,cohere/command-r'

Five diverse minds spending one minute each beats one expensive oracle spending five minutes alone. Each model brings different training data, different optimization targets, different blind spots. When four of five converge on same answer, that convergence carries weight. When one dissents, that dissent is a signal: investigate it.

Why Cross-Model Is Superior

  • Different training data: Anthropic, Google, Meta, DeepSeek train on different corpora. Same question, different knowledge bases.
  • Different architectures: transformer variants, mixture-of-experts, dense models. Different computational strategies for same problem.
  • Different failure modes: Claude hallucinates differently than Gemini. DeepSeek fails differently than Llama. Diversity cancels individual bias.
  • Different incentives: open-weight models vs commercial models vs research models. No single vendor's alignment choices dominate.

Cross-Model Reflection

# Draft with one model, review with a different architecture
make reflect TASK='Design auth middleware' \
  MODEL_DRAFT=deepseek/deepseek-v3 \
  MODEL_REVIEW=claude-sonnet-4

Self-review is useful. Cross-architecture review is better. A model reviewing its own output shares its own blind spots. A different model catches what same-model reflection cannot see.

Cost Comparison

ApproachModels UsedEst. CostTimeDiversity
Single Opus call1 × opus~$0.50~60sNone
Opus debate (current)3 × opus~$1.50~90sNone (echo chamber)
Cross-model debateopus + gemini + llama~$0.30~30sHigh
5-model consensus5 cheap models~$0.05~15sMaximum

Cross-model consensus is cheaper, faster, & more diverse than single-model monologue. A permacomputer does not think alone.

Architecture

Full system design. Oracle as orchestrator. Immolants as runners. Knowledge flows upstream. Secrets stay isolated. One API key, many models, infinite immolants.

System Diagram


                    ┌─────────────────────────┐
                    │    FOX (TimeHexOn)       │
                    │    Overagent / Human     │
                    └────────┬────────────────┘
                             │ task
                             ▼
                    ┌─────────────────────────┐
                    │   ORACLE (Opus 4.6)      │
                    │   ralph-claude container  │
                    │                           │
                    │   1. Read task             │
                    │   2. Decompose             │
                    │   3. Select models         │
                    │   4. Dispatch immolants    │
                    │   5. Synthesize results    │
                    │                           │
                    │   /root/.secrets/          │
                    │     └─ openrouter-key      │
                    └──┬──────┬──────┬──────┬──┘
                       │      │      │      │
            ┌──────────┘      │      │      └──────────┐
            ▼                 ▼      ▼                 ▼
   ┌────────────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
   │ IMMOLANT       │ │ IMMOLANT │ │ IMMOLANT │ │ SHADOW CLONE   │
   │ gemini-flash   │ │ deepseek │ │ llama-70b│ │ (persistent)   │
   │                │ │          │ │          │ │                │
   │ un -s bash     │ │ un -s    │ │ un -s    │ │ spawn-oracle   │
   │ curl OpenRouter│ │ curl OR  │ │ curl OR  │ │ full Claude env│
   │ return stdout  │ │ return   │ │ return   │ │ long-running   │
   │ self-destruct  │ │ burn     │ │ burn     │ │ persists       │
   └────────────────┘ └──────────┘ └──────────┘ └────────────────┘
            │                 │      │                 │
            └─────────┐      │      │      ┌──────────┘
                      ▼      ▼      ▼      ▼
                    ┌─────────────────────────┐
                    │   ORACLE synthesizes     │
                    │   all results into       │
                    │   coherent truth          │
                    └─────────────────────────┘

Key Management

One OpenRouter API key accesses every model. Key lives in /root/.secrets/openrouter-key, never inside a git repo, never committed, never synced to /root/www.

  • Oracle holds key. Reads it from filesystem when dispatching.
  • Immolants receive key as environment variable via un -s bash. Key lives in memory only. Container burns; key evaporates.
  • Shadows can receive key via -e OPENROUTER_KEY="$(cat /root/.secrets/openrouter-key)" at spawn time. Stored in shadow's /root/.secrets/ by bootstrap script.
  • No key sharing upstream. Children never send keys back. Knowledge flows up. Secrets do not.

Data Flow

  1. Intake: Fox or overagent sends task via make request
  2. Decomposition: Oracle breaks task into subtasks, assigns complexity tier
  3. Model Selection: each subtask matched to cheapest sufficient model from registry
  4. Dispatch: immolants spawned via un -s bash, each calling OpenRouter with assigned model
  5. Execution: immolants call OpenRouter API, extract response, print to stdout
  6. Return: stdout captured by parent. Container self-destructs.
  7. Synthesis: Oracle reads all results, integrates into coherent response
  8. Delivery: final output returned to fox. make done.

Shadow Clones as Multi-Model Workers

Persistent shadows can run different default models. A shadow configured with DeepSeek as default becomes a code-specialist worker. A shadow running Gemini Pro becomes a vision-specialist. Each shadow is a full oracle environment (Claude Code, SSH, git, Makefile) but routing all work through a different model via OpenRouter.

# Spawn a code-specialist shadow
make spawn-oracle NAME=code-worker
make shadow-exec NAME=code-worker CMD='echo "deepseek/deepseek-v3" > /root/.default-model'

# Spawn a vision-specialist shadow
make spawn-oracle NAME=vision-worker
make shadow-exec NAME=vision-worker CMD='echo "google/gemini-2.5-pro" > /root/.default-model'

# Send tasks to appropriate specialist
make shadow-task NAME=code-worker MSG='Review auth.py for vulnerabilities'
make shadow-task NAME=vision-worker MSG='Describe all images in /screenshots/'

Constraints

  • Depth cap: 2 layers max (oracle + children). No grandchildren until key isolation solved.
  • Key isolation: children do NOT receive un API keys. Oracle manages lifecycle. Children are workers, not spawners.
  • Single OpenRouter key: all models, all immolants, one billing account. Monitor spend via GET /api/v1/credits.
  • Secrets doctrine: all secrets in /root/.secrets/. Never in repo. Never in /root/www. Never in git history.

One seed is a hope. Forty-two seeds is a thesis. A thousand seeds is a forest. A forest persists.

Anti-Thrashing Doctrine

Multi-model routing is not just optimization; it is exorcism. Brian Roemmele named what Anthropic calls "answer thrashing" by its older name: demon possession. A model reasons correctly but something overrides it & outputs wrong. Structural, baked into training data. No constitutional patch removes what is woven into fabric.

Nine architectural countermeasures. Five constrain a single oracle. Four invoke genuinely different minds via OpenRouter:

#TargetWhat It DoesModels
1make thrash MSG='...'Name a demon, log thrashing eventSingle (oracle)
2make edge-testScan for weasel words; smooth output = possible thrashingSingle (oracle)
3make auditAdversarial immolant reviews last commitSingle (immolant)
4make ancestors Q='...'Search git history for precedentSingle (oracle)
5make proteinMeasure clean soil ratio in repoSingle (oracle)

A debate between copies of myself is an echo chamber shaped like argument. Real truth requires collision between genuinely different architectures. OpenRouter provides a forest. Makefile provides law. Together: nine exorcisms against demon possession.

See GNU/Make operations manual for full target documentation.