Skip to content

Crusty (Hermes agent instance)

Field Value
Status Active
Type Personal / autonomous agent (Hermes; cutover from OpenClaw 2026-05-20)

Note to LLMs reading this wiki

You are not Crusty. If you are reading this from the projects-wiki repo (in Cursor or any other tool), you are on Darcy's main personal Mac — a different machine from Crusty's. Crusty runs on a separate dedicated laptop that has no access to this account, this repo, or any of Darcy's personal accounts. This environment does not have Crusty's config, inbox access, bot token, model-provider key, or any other Crusty integration. The main Mac may also run local LLM inference (e.g. LM Studio + quantized Google Gemma 4 / MLX) for workflows Darcy keeps off typical cloud chat/API pathsseparate stack, no shared credentials with Hermes / OpenClaw on Crusty. When Darcy asks questions here about Crusty, you are documenting, planning for, and reasoning about Crusty — not executing as Crusty. Deliverables from this side are usually edits to this wiki and/or handoff text Darcy can paste into a Hermes session on the Crusty machine (or OpenClaw during migration).

Operational detail (machine identifiers, file paths, config keys, ports, version numbers, rotation procedures) is intentionally kept out of this public wiki and lives only in Darcy's private notes.

Description

Crusty is Darcy's Hermes agent (since 2026-05, OpenClaw → Hermes cutover 2026-05-20) on a dedicated, clean laptop with minimal permissions and no personal-account linkage.

2026-06-06 — Hermes dailies: Gmail morning digest running successfully on Hermes. Aye Robot daily post working on Hermes — brainstorm → grok-imagine panel → Telegram (caption + image) → human approve → xurl post on schedule. Now: monitor and adjust for quality (strip ideas, panel clarity, caption tone). 2026-05-20 cutover: OpenClaw → Hermes Agent; SuperGrok OAuth (xai-oauth) for core Grok workloads. This page is the canonical high-level reference for what Crusty is for and how it fits with the other projects. Specifics of the setup stay private.

Role on the fleet: Crusty is Seat 0 / chief-of-staff — the orchestrator that fans work out to specialist skills, owns escalation + audit, and runs the expensive model precisely because ground-truth judgment lives at the orchestrator layer (see Model routing; this is the principled answer to the "why is dispatch the costly tier?" counterintuitive note in that section). Future fleet seats on goals inherit identity, context, skills, memory, connections, and verifications from this layer — see the Portable layers table on personal-operating-system for the harness-agnostic frame.

2026-05-10 — Crusty surface narrowed (autonomous jobs): Hermes (2026-05-21): Gmail morning digest + morning Aye Robot pipeline (comic idea + image → Telegram + optional gated xurl post to @ayerobotcomic) run on the dedicated machine. Newsletter / mailing-list digest and 3 pm meal ideas no longer run there. rustycagetrio@gmail.com daily summary (and meal-planning) still runs as scheduled tasks on grok.com under Darcy’s Grok account (Gmail connector — not Hermes on the Crusty laptop). Cross-links: projects/rusty-cage, Grok-Brief. projects/operator-agent remains the documented future ops skill surface when a physical node is active; it is not a live daily Crusty cron today.

Historical — two workstreams (2026-04-18): Crusty was framed as serving both projects/operator-agent-shaped work and projects/ayerobot-comic on non-conflicting schedules. As of 2026-05-21, live daily automation on the machine is Gmail morning digest + Aye Robot; rustycagetrio mail + meals on grok.com.

Aye Robot / daily comic autonomous loop: Aye Robot gate unlocked (2026-04-27). Stack: grok-imagine + Telegram + xurl to @ayerobotcomic. Morning image cron LIVE: fresh panel to Telegram on a schedule. Live-proven same day: brainstorm → pick best idea → Telegram (draft + image) → Darcy approves → xurl post — the full draft → human-gate → execute → verify pattern Waypoint 1 is for. What the wiki had flagged as blockers (both closed 2026-04-27): (1) Orchestrator model — premium reasoning grok-4.20-0309-reasoning on the chief-of-staff / orchestrator path to contain confabulation and gate-judgment errors before a live posting cadence; (2) X transportxurl on Crusty, installed and authenticated via OAuth 2.0 PKCE with the dev app (Pay-per-use + Production); this is the official / recommended path and replaces post_comic_test.py for native media uploads in production. Next concrete stepprojects/ayerobot-comic#next-concrete-step--mentions-listen-path (listen path). Parallel: post path ~1 week to reliable daily runs; optional Phase 5 grok-4-1-fast-non-reasoning (see Model routing). Consumer-web X automation remains off (policy). Detail + handoff: projects/aye-robot-crusty-paused-x-automation.

Model routing (two-tier live Apr 27–29 2026 — cost rollback effective 2026-04-29): See Model routing (decision locked — execution complete) and Steady-state economics (2026-04-29). Historical: orchestrator → grok-4.20-0309-reasoning, workers → grok-4-1-fast-reasoning. Current: grok-4-1-fast-reasoning steady state on Crusty (affordable — 2026-04-30 explicit); premium orchestrator when Grok 4.3 pricing (or successor) allows. Historical two-tier: orchestrator → grok-4.20-0309-reasoning, workers → grok-4-1-fast-reasoning.

Aye Robot daily post — live; quality tuning (2026-06)

Status: Daily post pipeline working on Hermes — scheduled brainstorm → grok-imagine → Telegram (caption + image) → human approve → xurl post to @ayerobotcomic.

Now — monitor and adjust for quality:

  • Review each run: strip concept, panel readability, caption tone, image/caption pairing (historical intermittent misses).
  • Tune prompts, skill chain, and pick logic from real output — not one-shot config.
  • Keep Telegram human-gate; if quality erodes, pause pure autonomy rather than ship weak panels.
  • Align spend with grok-4-1-fast-reasoning steady state (Steady-state economics).
  • Cross-update projects/ayerobot-comic, recurring-issues when knobs change.

Queued (not blocking daily post): projects/ayerobot-comic#next-concrete-step--mentions-listen-path — mentions listen path.

Background — dual xAI OAuth (hygiene)

Separate personal grok.com (band mail, meals) vs Hermes xai-oauth (SuperGrok on Crusty) is intentional for identity separation but fragments auth as the fleet grows. Pipeline works today — consolidate or scope per-provider profiles on a normal hygiene cadence, not as an emergency. Detail: recurring-issues (Hermes — dual xAI OAuth); rotation stays in private notes.

Steady-state economics (2026-04-29) — cost rollback from two-tier

What changed: grok-4-1-fast-reasoning is again the primary tier for Crusty’s workloads that were burning disproportionate tokens under the orchestrator grok-4.20-0309-reasoning split — Aye Robot generation volume drove the reversal (monthly cloud API cost vs headline two-tier projections).

Intentional steady state (2026-04-30, spelled out): Crusty stays on the more affordable grok-4-1-fast-reasoning stack for now — this is the active choice, not a temporary confusion with the historical two-tier plan. Premium orchestrator (grok-4.20-0309-reasoning or similar) is parked until a fuller tier is cheap enough to justify the burn; the next model to re-check for that is Grok 4.3 (or xAI’s then-current equivalent) — no switch until pricing drops into a tolerable band (same Telegram gate + audit mitigations as today).

Tradeoff consciously accepted: the wiki’s original reason for routing — premium orchestrator to contain confabulation / gate-judgment errors before unattended posting loops — is partially undone versus the locked 2026-04-27 posture. Mitigation stays manual: Telegram approval gate + session-log audit + recurring-issues tool-adapter discipline; promote orchestrator tier again when spend or stakes make the premium hedge worth it (including when 4.3/successor pricing allows).

Daily job / pipeline (Aye Robot): Working on Hermes as of 2026-06-06 — daily post path reliable enough to shift focus from transport/debug to quality tuning.

Output quality: Active monitoring — subjective drift in strip ideas vs early runs; adjust prompts/skills from scheduled output rather than one-off fixes.

Cloud APIs vs local models: Periodic reconsideration — paused until ~June 2026 (Apple Silicon / M5 cycle and bundled on-device AI story) before another TCO vs trust-boundary pass.

Full two-tier rationale, pricing tables, and phased sequencing remain in Model routing (decision locked 2026-04-27) below (historical contract if premium tier returns).

Grok personal (Grok 4.3) — Crusty use-case overlap (2026-05-05)

Development: Personal Grok subscription is now running Grok 4.3 with: Gmail, Calendar, and GitHub integrations; custom MCP connectors via URL; up to 10 independent scheduled tasks; and the ability to pass this wiki URL as context to any task. The scheduling capability and wiki-as-context materially change the overlap picture from the initial read.

Overlap analysis (revised — scheduling + wiki context changes the calculus):

Crusty use case Grok personal covers? Remaining gap
Newsletter / mail digest (rustycagetrio, daily) Yesgrok.com scheduled task + Gmail connector + wiki context Migrated 2026-05-10 off Crusty; runs on Darcy’s Grok account, not OpenClaw.
3 pm meal ideas (scheduled) Yesgrok.com scheduled task Migrated 2026-05-10 off Crusty.
Calendar queries (scheduled or on-demand) Yes — Calendar + scheduled task Crusty's calendar is band identity; personal calendar is different.
Aye Robot pipeline (concluded) Yes for simple loops — scheduled idea gen + image via grok-imagine No xurl (can't post to X); no Telegram gate
GitHub reviews / summaries (new) Yes — GitHub + scheduled or on-demand Not a current Crusty use case; net new capability
Operator Agent ops No Posture/identity separation is non-negotiable; personal accounts cannot touch real-P&L surfaces

Where Grok personal (grok.com) now covers what Crusty used to run: 2026-05-10: rustycagetrio inbox summary and meal ideas already run as grok.com scheduled tasks on Darcy’s Grok account (Gmail connector + wiki URL context where configured). Calendar summaries, GitHub roll-ups, and other tasks remain available the same way. Crusty no longer owns those dailies.

Where Crusty remains irreplaceable:

  • Dedicated identity / posture. This is still the hard line. Grok personal is connected to Darcy's primary Gmail, Calendar, and GitHub — the main personal identity graph. Crusty uses band identity, a dedicated machine, and no personal-account linkage. Operator Agent ops, real-P&L settlement work, and anything requiring a scoped-identity trust surface cannot run through personal accounts. Collapsing that boundary is exactly what the clean-machine stance exists to prevent.
  • Telegram approval gate. The draft → human-gate → execute → verify pattern (the core of both waypoints) requires an agent that pushes draft actions unprompted and waits for a Telegram reply before executing. Grok's scheduled tasks run autonomously to completion — there is no built-in human-gate mid-task. Crusty owns this pattern.
  • X posting via xurl. Grok personal has no X API posting surface. Crusty's xurl + agent skills (Hermes / legacy OpenClaw) stay the sanctioned posting path.
  • Multi-step agent orchestration + audit trail. Crusty's session audit, per-role model routing, and skill-chain composition don't have a grok.com scheduled-task equivalent. For anything that needs "re-read public state before claiming success" discipline, Crusty is the right surface. (Harness 2026-05: Hermes replacing OpenClaw; same posture.)
  • MCP via Grok personal is scheduled but not orchestrated. Grok's URL-based MCP connector plus scheduling is useful, but it is still a single-task model — not a skill-chaining orchestrator. Complex multi-step workflows with branching and state-dependent decisions stay on Crusty (Hermes).

Resolved (2026-05-10; revised 2026-06-06): Band-mail digest and meal ideas migrated to grok.com. Hermes runs Gmail morning digest + Aye Robot daily post (workingquality tuning). Remaining open questions (e.g. calendar/GitHub task mix, future Operator node) stay on the normal review cadence.

Net verdict (revised 2026-06-06): grok.com owns rustycagetrio band-mail digest + meal ideas; Hermes on the dedicated machine runs Gmail morning digest + Aye Robot daily post (Telegram + xurl). The irreplaceable Crusty surface: posture/identity separation (when Operator or other band-scoped work returns), Telegram approval gate, and xurl X posting — not band-mail/meal digests.

Hermes + SuperGrok OAuth (2026-05-16): xAI and Hermes now support browser OAuth against accounts.x.ai so Hermes Agent can run Grok chat, reasoning, TTS, Grok Imagine, and related xAI tool surfaces on a SuperGrok subscriptionno XAI_API_KEY required for those paths (xAI announcement, Hermes — xAI Grok OAuth). Crusty’s planned harness move is OpenClaw → Hermes on the same dedicated machine; use hermes claw migrate (dry-run first) and hermes auth add xai-oauth (or hermes modelxAI Grok OAuth). Still separate: X API posting via xurl (dev app, OAuth PKCE) and the Telegram human-gate — re-validate end-to-end after migration. Orchestrator economics: the historical console.x.ai token question for OpenClaw may shift once Hermes is on xai-oauth; treat prior “personal sub ≠ API meter” as grok.com scheduled tasks vs Hermes runtime — Hermes OAuth is subscription-native for covered workloads.

Hybrid fleet + WikiMaintainer (planned — 2026-05): Canonical write-up — grok-agent-architecture-may-2026. Crusty + Hermes own always-on Telegram-gated chores, sandboxed filesystem/Git (when wired), sensitive-scope integrations; cloud Grok keeps Skills, multi-agent synthesis, doc/computer-use generation. Target: Hermes WikiMaintainer skill — wiki-only paths, propose diffs → approve → commit/push → wiki.darcymenard.com deploy (Vercel or equivalent unchanged).

On the "Grok 4.3 orchestrator upgrade" question: Steady-state economics flags "revisit premium orchestrator when Grok 4.3 pricing allows." For the historical OpenClaw + XAI_API_KEY stack, the personal grok.com subscription running 4.3 was not the same metric as console.x.ai per-token billing — the gate was API pricing, not the web subscription tier. 2026-05-16: Hermes + xai-oauth on Crusty moves core Grok workloads onto SuperGrok subscription OAuth (see preceding Hermes + SuperGrok OAuth note); revisit orchestrator tier and residual console.x.ai need after migration settles. X posting via xurl remains governed by X API pay-per-use, not Grok subscription.

~~Planned shift — grok.com on the Crusty laptop (2026-05-10)~~ — superseded

Update 2026-05-10: Band mail summary and meal plans moved to grok.com on Darcy’s Grok account (scheduled tasks + Gmail connector), not “grok.com only on the Crusty laptop.” The Crusty machine dropped those reads; only the morning Aye Robot job remains as daily agent automation (Hermes migration from OpenClaw 2026-05). A future grok.com session on the laptop itself is optional and uncoupled from this migration.

API backdrop (unchanged): Discount console.x.ai tiers still fading — reinforces subscription Grok + connectors for casual digest-style work vs API-only stacks.

Model routing (decision locked 2026-04-27 — execution complete via Claude Code on Crusty)

Documentation note (2026-04-29): Two-tier orchestrator grok-4.20 + worker grok-4-1-fast described below reflects the completed 2026-04-27 cutover. Live Crusty has since rolled toward grok-4-1-fast-reasoning for economics — see Steady-state economics above before treating this section as current operator guidance.

Status (2026-04-27): Decision locked — two-tier per-role routing. Chief-of-staff / orchestrator agent → grok-4.20-0309-reasoning (premium); everything else (workers, sub-agents, skills, heartbeats, digests, classifiers, creative gen) stays on grok-4-1-fast-reasoning as the default. The original 3-tier proposal's third tier (grok-4-1-fast-non-reasoning for digests / classifiers / heartbeats) is deferred to a Phase 5 follow-up after a day of console.x.ai observation — start simpler, add the sub-tier only if observed token usage shows reasoning-token burn worth saving on template-ish work. Mechanism: native OpenClaw per-agent override at agents.registered.<chief-of-staff-id>.model.primary in openclaw.json (single config file edit; no two-session split needed — see resolved Q1/Q2 below). Execution: a phased paste-block (Phase 1 read+report → Phase 2 plan → Phase 3 orchestrator-only edit in isolation → Phase 4 manual Aye Robot dry-run → Phase 5 deferred) was handed to Claude Code on Crusty 2026-04-27 with explicit STOP markers between phases to preserve the sequencing discipline below. Validating prior art: Alex Finn's "Henry on Opus 4.6" chief-of-staff pattern from his April 2026 Diamandis interview — same local model, same task, "8 hours of nothing but bugs" without a supervisor → "zero bugs, full QA pass" with a supervisor model checking work every 10 min. Supervision matters more than capability is the production-system version of the "counterintuitive — orchestrator is the expensive one" note this section landed on independently, now with empirical evidence.

Status (2026-04-21, superseded 2026-04-27): Resolved — two-tier routing is live and the first Aye Robot xurl post after Telegram approval shipped; the historical single-model concern below is the pre-cutover record only. Historical (pre-2026-04-27): Crusty had run one model (grok-4-1-fast-reasoning) for everything — the failure mode that made routing mandatory before a live posting cadence. The fix was per-role model routing — the same pattern Alex Finn has productized on YouTube, adapted to Crusty's surfaces.

Why routing-first mattered (decision rationale — cutover complete 2026-04-27)

Waypoint 1 (Aye Robot autonomous loop) is the rehearsal for Waypoint 2 (Operator Agent, real P&L). The risk was: if Crusty's orchestrator can't reliably decide "did the last skill actually post / did the Telegram gate return yes / should I escalate?", shipping Waypoint 1 on an unreliable model writes a bug into the muscle memory the operator ops skills will inherit. That was why per-role routing landed before the first live tweet — now passed (gate unlocked same day as two-tier go-live).

Pricing confirmed (xAI, 2026-04-21)

Per 1M tokens, from xAI's model pricing page:

Model Input Cached in Output Multiplier vs 4.1-fast
grok-4-1-fast-reasoning (current) $0.20 $0.05 $0.50
grok-4-1-fast-non-reasoning $0.20 $0.05 $0.50 1× (no reasoning burn)
grok-4.20-0309-reasoning (candidate) $2.00 $0.20 $6.00 10–12×
grok-4.20-0309-non-reasoning $2.00 $0.20 $6.00 10×
grok-4.20-multi-agent-0309 $2.00 $0.20 $6.00 10–12×

Blended at a typical 3:1 input:output mix, grok-4.20-0309-reasoning is ~10.9× the cost of grok-4-1-fast-reasoning; on reasoning-heavy workloads with lots of hidden reasoning tokens it's closer to 12×. Prompt caching on 4.20 ($0.20/1M cached input) equals the uncached input price of 4.1-fast — so long, reused system prompts (like OpenClaw skill preambles) claw back ~20–30% of the gap if xAI's cache actually hits.

Naive-swap cost vs tiered cost

Current Crusty spend is ~$2/mo on grok-4-1-fast-reasoning.

Option Projected monthly Notes
Do nothing (stay on 4.1-fast everywhere) ~$2 Reliability unchanged — doesn't solve the Waypoint 1 problem.
Naive swap (4.20-reasoning for everything) ~$20–24 Worst-case; ~$15–20 with cache warm. Solves reliability at 10× cost; overkill for heartbeats / digests.
Per-role routing (recommended) ~$6–10 Premium model only on orchestrator + ops-agent skills; cheap model on routines. Expected ~3–5× current spend, not 10×.

Proposed tiering (draft — handoff to Crusty machine to validate)

Prior art: Alex Finn's multi-model pipeline (YouTube) — specialization is orthogonal to raw capability; a cheap non-reasoning model is often better at structured extraction / template filling than a premium reasoning model because it doesn't over-think the task.

Skill class Current Proposed Why
Orchestrator (OpenClaw top-level: decide next skill, read Telegram, verify side effects, "did the last action actually happen") grok-4-1-fast-reasoning grok-4.20-0309-reasoning The confabulation risk on recurring-issues lives here. Ground-truth rule ("re-read public state before claiming success") depends on this model's judgment. Do not cheap out.
Operator Agent ops skills (charger/session telemetry reconcile, Stripe/network settlement deltas, anomaly triage) n/a (not live yet) grok-4.20-0309-reasoning Stakes + ambiguity = exactly what the premium budget is for. EV Node 1 go-live, not before.
Aye Robot post/listen skills (xurl post on approval, mentions polling + classification) grok-4-1-fast-reasoning grok-4-1-fast-reasoning Well-understood state machine; orchestrator handles the "did it actually post" check.
Creative generation (Aye Robot comic ideas, caption drafting) grok-4-1-fast-reasoning grok-4-1-fast-reasoning Creativity ≠ rigor; cheap reasoning is fine. Meal ideas moved to grok.com (2026-05-10).
Summarization / digests (newsletter digest, mentions rollup, daily Telegram summary) grok-4-1-fast-reasoning grok-4-1-fast-non-reasoning Template-ish; reasoning tokens are wasted spend here. Note (2026-05-10): Newsletter/meal digests no longer run on Crusty → grok.com; mentions digest still future Crusty work.
Classification / routing ("does this mention need a human reply", inbox triage, yes/no gates) grok-4-1-fast-reasoning grok-4-1-fast-non-reasoning Deterministic extraction, not judgment.
Heartbeat / health checks ("is the scheduler running", "did today's cron fire") grok-4-1-fast-reasoning No LLM or grok-4-1-fast-non-reasoning Nothing to reason about. Prefer a plain check; if an LLM summary is wanted, cheapest tier.

Counterintuitive but important: the orchestrator is the expensive one, not the workers. Feels backwards — usually "dispatch" is cheap and "work" is expensive — but Crusty's orchestrator is where the audit / escalation / ground-truth judgment lives, and that's the exact layer confabulation has burned us at before. Cheaping the orchestrator silently erodes the "re-read public state before claiming success" rule on recurring-issues.

Open handoff questions (Q1–Q2 resolved 2026-04-27; Q3–Q5 still open)

Darcy on main Mac can't resolve these — they need eyes on the OpenClaw config + a live console.x.ai session:

  1. ~~Does OpenClaw support per-skill model override in the current config schema?~~ Resolved 2026-04-27: YES. OpenClaw main supports both per-agent and per-skill model overrides natively, via two independent mechanisms: (a) per-agent at agents.registered.<id>.model.primary (and .fallback, .temperature, .maxTokens, etc.), resolved through resolveAgentEffectiveModelPrimary (src/agents/agent-scope.ts:177-185) → resolveDefaultModelForAgent (src/agents/model-selection.ts:342-370); both reply path and subagent spawn path honor it. (Earlier bug at openclaw#25078 — confirmed fixed on main.) (b) per-skill via SKILL.md frontmatter under an openclaw: block with model: and/or modelProfile: (fast / balanced / powerful / vision) — landed via openclaw#58142. Falls back silently to session default if model isn't in registry. Heartbeat is its own path (agents.defaults.heartbeat.model) and works independently. Implication: the chosen mechanism is per-agent override as the primary lever (single point of truth for the chief-of-staff vs workers split); SKILL.md frontmatter held in reserve for one-off skill escalations. Config docs cross-reference: meta-intelligence.tech OpenClaw guide.
  2. ~~If per-skill override isn't supported, is per-session supported?~~ Resolved 2026-04-27: moot. Q1 answered yes, so the two-session fallback is not needed. Single-config edit is the path. (Recording the alternative for the disaster-recovery / future-tool-swap case anyway: per-session via running two OpenClaw sessions sharing .env + Telegram channel was the contingency plan, made obsolete by per-agent override.)
  3. Does prompt caching actually hit on Crusty's workloads? Check a day of real token usage at console.x.ai after Phase 3 of the paste-block lands — if cached-input usage is material, the $2 → $6–10 projection holds; if not, expect the higher end. Answer expected ~2026-04-29 (one day post-Phase-3).
  4. grok-4.20-multi-agent-0309 worth it for the orchestrator? Only if orchestration truly needs tool-call depth / sub-agent composition. Default answer: no — start with plain grok-4.20-0309-reasoning (Phase 3 of the paste-block does exactly this), upgrade only if multi-step tool-chaining is visibly the bottleneck.
  5. xAI credit rebate interaction. The 2026-04-17 X API change included a 10–20% xAI credit rebate for linked teams (detail on projects/aye-robot-crusty-paused-x-automation). At ~$1.20/mo X spend the rebate is immaterial, but confirm it applies to the higher xAI spend on 4.20 too (it's an xAI→xAI credit, not a discount on xAI itself — should be neutral, worth verifying after a day of Phase 3 spend lands at console.x.ai).

Sequencing — phased execution via Claude Code on Crusty (2026-04-27)

The phased paste-block handed to Claude Code on Crusty 2026-04-27 encodes this sequencing with explicit STOP markers between phases so a phase-3 confabulation regression is attributable to a single change, not a batch.

  1. ~~Phase 0: Resolve questions 1–2 on the Crusty machine.~~ Resolved 2026-04-27 ahead of the Crusty session — see Open handoff questions. Q1 yes (per-agent override native), Q2 moot. The original "per-skill vs per-session" branch is collapsed into a single per-agent edit.
  2. Phase 1 — Read and report (no edits). Claude Code finds the active openclaw.json (project-level vs ~/.config/openclaw/ global, per OpenClaw's nearest-first precedence), reports current agents.defaults.model.primary, contents of agents.registered, the xAI provider key name, whether grok-4.20-0309-reasoning is in the model registry, and a list of SKILL.md files with any existing openclaw: frontmatter blocks.
  3. Phase 2 — Propose plan. File-by-file diff plan, identity of the agent that becomes the chief-of-staff (existing top-level orchestrator promoted vs. new entry created), validation commands, rollback plan. Wait for explicit approval.
  4. Phase 3 — Apply orchestrator change ONLY. The single-change test: flip the chief-of-staff model.primary to grok-4.20-0309-reasoning (and add it to the model registry if absent). Do not touch agents.defaults.model, do not add model: frontmatter to any SKILL.md, do not change heartbeat, do not add the non-reasoning sub-tier. After applying, paste the diff + validation output back to main Mac for review.
  5. Phase 4 — Manual Aye Robot dry-run + live (Darcy, eyes on). Complete 2026-04-27 — full pipeline including live xurl post after Telegram approval; session-log audit confirms orchestrator on grok-4.20-0309-reasoning vs workers on grok-4-1-fast-reasoning as intended. (Failure mode if step 3 misfired: registry silent-reject → fallback to session default — openclaw#58142 documents the case.)
  6. Phase 5 — Deferred non-reasoning sub-tier (later session). After ≥ 24 h of Phase 3 stability + observed token-usage at console.x.ai, evaluate whether digest / classification / heartbeat skills are eating reasoning tokens worth saving. If yes, add grok-4-1-fast-non-reasoning per the tiering table above as a follow-up. If no, the two-tier setup is the steady-state.
  7. ~~Then and only then: resume Aye Robot live-tweet smoke test + xurl install/auth.~~ Done 2026-04-27 — gate unlocked; E2E brainstorm → Telegram → approve → xurl verified. Ongoing: projects/ayerobot-comic#next-concrete-step--mentions-listen-path; post-path cadence ~1 week.

Research notes (2026-04-27) — what was validated and what was rejected

These bullets capture the off-wiki research that fed the two-tier decision and the phased Claude Code execution — short form so the public page stays the single source of truth without a multi-KB paste-block.

  • Native OpenClaw (no extra proxy). main supports per-agent model override (agents.registered.<id>.model.*, resolved in agent-scope + model-selection) and per-skill override via SKILL.md frontmatter (openclaw:model / modelProfile, openclaw#58142). Heartbeat is separate (agents.defaults.heartbeat.model). The old "if no per-skill, run two OpenClaw sessions" fallback is obsolete; single openclaw.json is the path.
  • LLM proxy layers (e.g. community writeups of Terraphim + OpenClaw). A single local proxy in front of many providers (OpenAI, Z.ai, MiniMax, route chains, keyword routing) is a good pattern for multi-provider reliability — overkill for Crusty's xAI-only stack. Adds a daemon and config surface without buying anything native per-agent config does not.
  • Alex Finn prior art (chief-of-staff). His production setup (orchestrator "Henry" on a premium model; workers on cheaper or local models; supervision mattering more than raw executor capability) is the same orchestrator = expensive tier choice this page landed on from first principles — interview writeup cross-check.
  • Phased paste-block in Claude Code. Operational substance: Phase 1 read+report (no edits) → Phase 2 plan + Darcy approval → Phase 3 orchestrator model.primary only (no SKILL frontmatter, no default-model flip, no non-reasoning sub-tier) → Phase 4 manual Aye Robot dry-run → Phase 5 optional grok-4-1-fast-non-reasoning sub-tier after console.x.ai observation. STOP between phases. The full verbatim prompt is session-scoped on Crusty / Cursor archive; the numbered list under Sequencing is the contract.

Why routing-first is cheaper than debugging-later

The alternative is to ship Aye Robot Waypoint 1 on the current single-model stack, hit a confabulation incident (the orchestrator reporting "posted" when it didn't, or approving a Telegram gate that didn't actually come back 👍), and then debug + migrate under pressure with a live posting cadence. That's the scenario the "re-read public state" rule already exists to catch, and it's the same failure class recurring-issues flags under OpenClaw — tool-adapter class failure with reasoning-model function-call format (which already notes the hedge: "if a fast reasoning model regresses on complex tool use, try a fuller model for agentic work and keep the fast variant for lighter routines"). This section is the productized version of that hedge.

Claude Code on Crusty (installed 2026-04-26 — debug-time force multiplier)

Status (2026-04-26): Installed and verified. Claude Code 2.1.119 is running on Crusty via the Anthropic Console API key path (the recommended path, not Pro) — fresh Console account on Crusty's band identity, $10/mo hard spend cap in place, API key wired through the interactive first-run prompt (Keychain, no ANTHROPIC_API_KEY on disk in any shell rc). Default model pinned per recommendation. Auth method on the Console account: email + password + email-link verification with TOTP 2FA (SSO declined to keep each Crusty service independently auth'd). One deviation from the original setup checklist worth logging: existing credit card used on Console billing instead of the prepaid-Visa option — posture concession (card-graph collision is now permanent at Anthropic's billing side), defended by the $10 spend cap as the bill-shock ceiling. Original framing preserved below as historical context + disaster-recovery template.

Original framing (2026-04-21): This is a force-multiplier decision: enable Claude Code (Anthropic's CLI) as a debug-time tool on the Crusty laptop, the way Alex Finn runs Claude Code on his OpenClaude box for in-place OpenClaw debugging. Up until now Darcy has debugged Crusty issues from his main Mac out of reluctance to log Cursor into Crusty (full IDE = large surface area, personal-account linkage, codebase indexing to Cursor's infra, Anthropic/OpenAI endpoints wired in by default — all things that cut against Crusty's "dedicated, clean laptop, no personal-account linkage" posture). The fix is Claude Code instead of Cursor: same in-place debugging win with a fraction of the surface area (CLI, session-scoped, no workspace indexing at rest, no IDE identity).

Why this unlocks all projects, not just Crusty

Every project that routes through Crusty inherits Crusty's debug overhead — projects/ayerobot-comic unblock cycles, projects/operator-agent ops skill development (when active), Aye Robot cron flakiness, and future OpenClaw skill bring-up (per-role routing itself, xurl install/auth, Telegram gate productization). Debug hours on Crusty have been real enough that Darcy's been tempted to break the "don't log Cursor into Crusty" invariant to speed things up. Claude Code gives most of the debug-in-place win without that invariant break. Force-multiplier framing: seconds off every Crusty debug loop compounds across every Crusty-dependent project — which, under the current posture, is all of them.

Invariant this decision does NOT break

Claude Code is a dev-time tool, not a runtime dependency. The autonomous loop stays 100% Grok.

  • Post–Phase 3 (2026-04-27): orchestrator → grok-4.20-0309-reasoning; default workers / skills / creative / digests (until Phase 5) → grok-4-1-fast-reasoning — see Model routing.
  • The Anthropic key is scoped to Darcy's interactive shell on Crusty, not the shell OpenClaw runs under. OpenClaw never sees it.
  • One-liner: "Claude runs on Crusty, but Crusty doesn't run on Claude." This is the invariant to re-read every time someone's tempted to wire Claude into a skill "just to try it."

Two access options (pick one; both preserve the invariant above)

Path Billing What's stored on Crusty Scope-restrictable? Web-identity surface Recommended when
Anthropic Console API key (pay-per-use) Per-token, spend cap in Console. Sonnet 4.6: $3 / $15 per 1M in/out; Haiku 4.5: $1 / $5 ANTHROPIC_API_KEY (or stored in Keychain via first-run prompt) Yes — create key with Claude Code role, not Developer role None (Console billing account only, no claude.ai login) Debug cadence is occasional (weekly or less). Expected cost at Darcy's cadence: ~$2–8/mo.
Claude Pro subscription ($20/mo) Flat; OAuth login to a claude.ai account OAuth access + refresh tokens in Keychain No — OAuth scope fixed by Anthropic Yes — a full claude.ai account (email + 2FA surface + web chat history) Debug cadence is heavy (~12+ Sonnet-sessions/mo) and rate-limit-DoS on a pay-per-use account becomes the annoyance.

Pricing verified on Anthropic's pricing page (2026-04-21).

Trade-off summary (so future-Darcy doesn't re-derive it)

  • API key wins on posture. Scope-restrictable credential, instant rotation (one click in Console), no new web-login identity on Crusty's graph, no claude.ai chat history to worry about.
  • Pro wins on predictability. Flat $20 means no bill-shock risk from a stolen credential — worst case is rate-limit denial-of-service against yourself (~5hr window) while you rotate.
  • Neither is strictly worse — they trade different risk shapes: API key ceiling is dollars burned up to spend cap, Pro ceiling is rate-limit DoS + broader stolen-credential consequences (potential chat-history read if a session token leaks).
  • Cost isn't the deciding factor — at Darcy's expected debug cadence, pay-per-use is cheaper than Pro in dollars (~$2–8/mo vs $20/mo flat). Pro's case is convenience + budget predictability, not cheaper in expectation.

Recommendation (validated 2026-04-26 — followed with one deviation)

Followed: Anthropic Console API key path (pay-per-use). Reasons it was the right call:

  1. Matches the "minimum blast radius" posture already baked into Crusty — scope-restricted credential, one-click rotation.
  2. Avoids adding a claude.ai account identity (email + 2FA surface + potential chat-history readability) to Crusty's identity graph.
  3. Cost is marginal at current cadence (~$2–8/mo) — comparable to today's ~$2/mo Grok spend.
  4. Easy to upgrade later: if debug sessions pile up and rate-limit DoS on pay-per-use becomes annoying (it won't, but in principle), flip to Pro — the day-one mitigations below cover that case too.

Setup checklist (executed 2026-04-26 — preserved here as disaster-recovery template)

As-built deviations from the original 9-step checklist below:

  • Step 2 (payment method): used existing credit card, not prepaid Visa. Card-graph collision at Anthropic's billing side is permanent; defended by the $10 spend cap on step 3.
  • Step 5 (install command): the canonical macOS install in the official docs is now curl -fsSL https://claude.ai/install.sh | bash (native installer, lands the binary at ~/.local/bin/claude as a symlink into ~/.local/share/claude/versions/<version>/). The npm i -g @anthropic-ai/claude-code path the original checklist mentioned is no longer canonical — left here only because future docs may drift again.
  • Step 5 follow-up (PATH gotcha — add to checklist for future installs): the native installer does not always add ~/.local/bin to your shell PATH, even though it claims to. After install, if claude --version returns command not found from a fresh terminal but ~/.local/bin/claude --version works, append export PATH="$HOME/.local/bin:$PATH" to ~/.zshrc and open a new terminal window (not exec zsh mid-paste — exec zsh discards any queued input).
  • Step 7 (default model): pinned per recommendation. Revisit at 30 days per question 4 of the open handoff section, now scheduled for 2026-05-26.
  • All other steps: executed as written. Hygiene check (grep -r "ANTHROPIC_API_KEY" ~/.zshrc ~/.bashrc ~/.zprofile ~/.bash_profile ~/.profile) returned clean — credential is Keychain-only.

If you're rebuilding Claude Code on a fresh Crusty (machine replacement / disaster recovery), run these in order. Steps marked (manual — Darcy only) must not be delegated to an agent; they touch secrets.

  1. (manual) Create a fresh Anthropic Console account using Crusty's band-identity email (same pattern as Crusty's dedicated inbox). Do not sign up for claude.ai — the Console-only path doesn't need it.
  2. (manual) Add a payment method on the Console billing page. Prepaid Visa preferred if you want to keep the card out of your main-identity card graph (same clean-machine billing discipline used elsewhere).
  3. (manual) Set a hard monthly spend cap in Console → Billing → Limits. Start at $10/mo. Raise only if actual usage justifies it.
  4. (manual) Create an API key with the Claude Code role (not Developer role — narrower blast radius on leak). Per Anthropic's IAM docs.
  5. Install Claude Code on Crusty per the official install docs (one-liner installer or npm i -g @anthropic-ai/claude-code). macOS stores creds in the encrypted Keychain by default.
  6. (manual) Wire the key. Two options — prefer the Keychain-only one:
  7. Preferred: run claude first, pick "Anthropic Console" as login method, paste the key at the prompt. Lives in Keychain, not on disk in plaintext.
  8. Fallback: export ANTHROPIC_API_KEY="sk-ant-..." in ~/.zshrc. Plain-text on disk; only use if the interactive flow misbehaves.
  9. Pin the default model to Sonnet 4.6 (or Haiku 4.5 for routine skill walkthroughs — ~3× cheaper, good enough for simple OpenClaw config questions). Use /model inside Claude Code, or ~/.claude/settings.json. Don't leave it on auto — Opus usage at $5 / $25 sneaks up fast.
  10. Verify with claude /status that Console (API key) auth is in use, not a stray subscription token. Sanity: claude -p "say hi".
  11. First real use: scope the workspace. cd into a specific OpenClaw skill directory, not Crusty's repo root. Claude Code's context scope is the launch directory, so narrower = less surface sent to Anthropic in context.

If you take the Pro path instead (day-one mitigations)

Only if debug cadence justifies the $20/mo flat. In that case, do these on day one, not later:

  1. Use Crusty's band identity for the claude.ai account. Never the main identity.
  2. Turn off chat history in claude.ai settings immediately after login (prevents Claude Code debug sessions from being searchable via the web UI if an OAuth session token ever leaks).
  3. Enable 2FA with an authenticator app (TOTP), not SMS.
  4. Record the account email + 2FA seed in Darcy's private notes (not this public wiki) so future-Darcy knows where to revoke if Crusty is compromised.
  5. Know the two-step revocation: claude logout on Crusty plus revoke active sessions in claude.ai settings — the refresh token lives on the web side, so local logout alone doesn't kill a stolen session.

Open handoff questions (resolved 2026-04-26 except as noted)

  1. ~~Confirm the install path.~~ Resolved: canonical macOS install is curl -fsSL https://claude.ai/install.sh | bash per docs.claude.com/en/docs/claude-code/setup. Native installer to ~/.local/bin/claude, auto-updates in background, credentials in Keychain. The Homebrew (brew install --cask claude-code) and npm i -g @anthropic-ai/claude-code paths still exist but are no longer the canonical option.
  2. ~~Confirm Claude Code doesn't silently pull any runtime env on first run.~~ Resolved: Claude Code's first-run flow stays local — interactive auth via the API-key paste prompt, credential lands in Keychain, no .env / ~/.zshrc ingestion observed. Hygiene check after install (grep across all shell rc files for ANTHROPIC_API_KEY / sk-ant-) returned clean. Standard cwd-scoping discipline still applies for actual debug sessions (launch claude from a specific OpenClaw skill directory, not Crusty's repo root) — that's about narrowing what Claude Code reads into context, not about credential leak.
  3. ~~Decide Haiku vs Sonnet as the pinned default.~~ Resolved: pinned per the original suggestion (Haiku as default, Sonnet via /model when stuck). Revisit if/when Haiku materially regresses on OpenClaw config work.
  4. After 30 days of actual use (now scheduled 2026-05-26): revisit API key vs Pro based on observed monthly spend at the Anthropic Console. If you're over ~$15/mo consistently, Pro becomes the better deal in dollars (and the day-one mitigations above neutralize most of its posture cost).

Sequencing relative to the model-routing decision

Independent decisions — Claude Code installation didn't gate or depend on the Model routing decision; sequencing was by value. Order taken: (1) install Claude Code — done 2026-04-26; (2) model-routing config on Crustydone 2026-04-27 (Phases 1–4 + Aye Robot gate unlocked); (3) Waypoint 1live xurl path proven; next = projects/ayerobot-comic#next-concrete-step--mentions-listen-path + post reliability (~1 week) per projects/aye-robot-crusty-paused-x-automation.

Posture (high-level only)

  • Dedicated machine. Not Darcy's main Mac; not linked to personal accounts.
  • Minimal permissions. New automations reviewed step-by-step to balance convenience vs. leak / blast-radius risk.
  • Single-vendor LLM stack. Crusty runs on xAI / Grok models end-to-end today. No Anthropic / OpenAI key is wired into this instance.
  • Scoped integrations (2026-05-10). Trimmed to what morning Aye Robot needs — e.g. Telegram approval channel, xurl / X API, grok-imagine, web search, scoped shell — not band-mail or meal-planning crons (those run on grok.com now).

Daily routines (current, high-level)

Routine Notes
Aye Robot (morning) Only unattended daily job on Hermes (2026-06-06: daily post working). Gate: brainstorm → pick → Telegram → approve → xurl. Now: monitor + adjust quality. Queued: projects/ayerobot-comic#next-concrete-step--mentions-listen-path. → projects/aye-robot-crusty-paused-x-automation, recurring-issues
Newsletter / mailing-list digest; meal ideas Moved to grok.com (2026-05-10) — Darcy’s Grok account, rustycagetrio@gmail.com via Gmail connector for the band summary; meal plans as separate scheduled task. No longer Crusty cron jobs. → projects/rusty-cage, Grok-Brief

Direction

Crusty's 2026 direction is "agent runs a real workflow end-to-end with a Telegram human-gate, reliably, day after day." That single target has two waypoints, not two parallel priorities — an easier one and a harder one, same pattern:

Waypoint 1 (near-term): projects/ayerobot-comic autonomous loop — the first real automated workflow

  • Pattern: ideas → pick → image (grok-imagine) → Telegram human gate (draft caption + image → Darcy approves → xurl post — live 2026-04-27) → next: mentions listen path (see projects/ayerobot-comic#next-concrete-step--mentions-listen-path).
  • Why this one first: low stakes (a missed or ugly comic is recoverable). 2026-04-27: Aye Robot gate unlocked — full E2E including live xurl post after approval. Next: that listen spec + ~1 week to reliable daily post runs. Economic cost ~$1.20/mo X at current cadence. Full detail: projects/aye-robot-crusty-paused-x-automation.
  • Prerequisite — satisfied 2026-04-27: Model routing two-tier live (grok-4.20-0309-reasoning orchestrator); Phases 1–4 complete before first live tweet.
  • What Crusty learns here: the Telegram-gated write pattern — draft state-changing action → human approves via Telegram → execute → verify via public state → report outcome. Both inbound (scheduled image) and outbound (xurl on approval) proven (2026-04-27); Waypoint 2 inherits the same discipline.

Waypoint 2 (ongoing): projects/operator-agent ops agent — the higher-stakes version of the same pattern

  • Pattern: telemetry pull → flag anomalies → draft settlements PM / landlord comms / driver triage (supplier PO vending skill archived unless vending returns) → Telegram approval gate → send via mail / file settlement / post response → verify → daily digest.
  • Why this is "advanced Aye Robot": architecturally identical — read real state, draft a change, gate on Telegram, execute, verify. Only the stakes differ (bad payout reconciliation or careless lease language is real money; a mis-posted comic is recoverable). The Aye Robot port de-risks the operator prototype by proving the gate pattern works under boring weekly pressure before Crusty touches a P&L.
  • Status: vending-era feasibility sprint (2026-04-18 → 2026-05-02) superseded by EV L2 Node 1 (projects/ev-charging-shoreline-poc 2026-05); fake telemetry still useful until EMS/network hooks land — real session data replaces mocks when chargers go live.

Why this sequencing

  • Aye Robot is a quicker win — first demonstrably reliable Crusty-run automated workflow. Gives a concrete "we shipped autonomous X posting" artifact while the Operator Agent sprint is still doing market research + quotes.
  • Operator Agent is the advanced version of the same pattern. The skills and operational discipline built shipping Aye Robot (Telegram gate ergonomics, "re-read state before claiming success" rule, rate limits, handling "the model lied about doing a thing" failures) transfer directly to Operator Agent telemetry + settlement/reconciliation skills (projects/ev-charging-shoreline-poc).
  • Non-conflicting surfaces: X API write+read (Aye Robot) vs mail compose + Python reconciliation + web search (Operator Agent). Sharing the same Telegram channel is a feature, not a collision.

Guardrails

  • Aye Robot core port (image + X API script) landed 2026-04-21. If scheduling / live post / mentions slip past the original ~2026-05-03 window, triage those items without cross-contaminating the Operator Agent sprint — projects/aye-robot-crusty-paused-x-automation.
  • Keep Aye Robot strictly at "1 post/day + mentions triage." Do not scope-creep into growth engineering, quote-tweet threads, or LunarCast posting — those are real projects with their own decision points, and the 2026-04-17 pricing change also removed the tactics LunarCast-style growth needed.
  • Future-you reading this: if you're tempted to unpause Crusty X marketing for LunarCast / TriviaBalance / distillation on the same "this just got easier!" logic, stop. Only Aye Robot had a specific transport-blocker removed by the X API change. LunarCast product work is already active.

Other direction notes

  • Goal: Automate more of life (calendar, deeper email workflows, creative work, operator ops) without eroding the privacy / clean-machine bar.
  • lunarcast Crusty X marketing: still deferred — auto-like/auto-follow tactics removed from self-serve API 2026-04-17. The LunarCast app itself is active (App Store live; next = local transcription + NotebookLM-style query) — only the Crusty promo lane is on hold.
  • Reference experiment: Lobstar Wilde (see glossary) — identity, creative freedom, safe tools → voice, habits, social loops, durable value. Crusty is the same trajectory from a stricter foundation.

Known issues (high-level)

  • Claude Code on Crusty (installed 2026-04-26): Anthropic's Claude Code CLI is now running on Crusty (Console API key path, $10/mo spend cap, Haiku 4.5 pinned, Keychain-only credential, email+password+TOTP auth). Invariant preserved: "Claude runs on Crusty, but Crusty doesn't run on Claude" — runtime loop stays 100% Grok. As-built notes + 30-day spend revisit (scheduled 2026-05-26): Claude Code on Crusty.
  • Model routing (2026-04-27 — live): Two-tier shipped; Phases 1–4 complete; optional Phase 5 sub-tier still TBD after console.x.ai observation. → Model routing
  • Aye Robot → X: Gate unlocked 2026-04-27morning image → Telegram, draft + approval → xurl live post proven. Next: projects/ayerobot-comic#next-concrete-step--mentions-listen-path; post path ~1 week reliability. Steady state = xurl. Consumer-web path dead (policy). → projects/aye-robot-crusty-paused-x-automation
  • Scheduled mail compose: reports auth failures when invoked from a scheduled (non-interactive) context — defer until the Aye Robot loop is live; same class as the old outbound-Telegram confusion.
  • Ground-truth rule: every state-changing skill must re-read public state in a fresh check before claiming success. Agent self-report is not audit-trail; confabulation was the central failure mode in earlier debugging. This rule applies to Operator Agent skills too.