Mission-Control/PROJECT.md

17 KiB
Raw Blame History

🎯 Mission Control — Project Plan

Merge three OpenClaw dashboards into a single, unified Mission Control platform.


Source Repos

Repo Purpose Stack Key Assets
abhi1693/openclaw-mission-control Base platform — work orchestration, governance, gateway management Python/FastAPI + PostgreSQL + Redis + Next.js (React 19) + Clerk auth + Docker Compose Organizations, boards, tasks, tags, approvals, agents, gateways, webhooks, activity feed, skills marketplace
mudrii/openclaw-dashboard Tracking layer — real-time metrics, costs, crons, sessions, system health Go binary (zero deps) + embedded HTML/JS + SVG charts Cost cards, cron status, session tracking, sub-agent activity, AI chat, system metrics (CPU/RAM/disk), 6 themes, alerts, token usage
jaffer1979/openclaw-pixel-agents-dashboard Agent visualization — pixel-art agent sprites, real-time activity Node/Express + Vite + React 19 + Canvas/WebSocket + JSONL parsing Agent sprites with activity bubbles, conversation heat, spawn sub-agents, hardware monitor, service controls, day/night cycle

Architecture Decision: What to Merge Into What

Base: openclaw-mission-control — this becomes the foundation because:

  • It has the richest data model (organizations, boards, tasks, approvals, agents, gateways, webhooks)
  • It has proper auth (Clerk or local bearer token)
  • It has a full API layer (FastAPI with SQLModel/SQLAlchemy)
  • It has multi-tenancy built in
  • It has the most mature frontend (Next.js 16 + React 19 + TanStack Query + Recharts)

Merge FROM dashboard — extract the tracking/monitoring features:

  • Cost tracking, token usage, model breakdown
  • Cron job status, scheduling, last/next run
  • Session tracking, sub-agent activity
  • System health (CPU, RAM, disk, gateway status)
  • AI chat panel (ask questions about your data)
  • Alert system (high cost, failed crons, context usage)
  • 6 themes + glass morphism UI

Merge FROM pixel-agents — extract the agent visualization:

  • Pixel-art agent sprites in a shared office scene
  • Real-time activity bubbles, conversation heat
  • Sub-agent spawning from the UI
  • Hardware monitor (CPU/GPU/RAM/disk/network)
  • Service controls (start/stop/restart gateway)
  • Day/night cycle ambient lighting

Technical Analysis

Base Platform (openclaw-mission-control)

Backend:

  • Python 3.12+, FastAPI, SQLModel/SQLAlchemy, PostgreSQL, Redis
  • Alembic migrations, RQ worker for webhooks
  • Full OpenClaw gateway integration via WebSocket RPC (device pairing, control UI)
  • Gateway methods: 60+ RPC calls for sessions, agents, cron, config, exec approvals, etc.
  • Auth: Clerk JWT or local bearer token (≥50 chars)

Frontend:

  • Next.js 16.1.7, React 19.2, TanStack Query v5, TanStack Table v8
  • Radix UI primitives, Tailwind CSS, Recharts, React Markdown
  • 40+ page routes (dashboard, boards, agents, approvals, gateways, skills, tags, etc.)
  • Cypress E2E tests

Data Model (27 tables):

  • Organizations, users, boards, board_groups, tasks, tags, approvals
  • Agents, gateways, activity_events, board_webhooks, skills
  • Custom fields, task dependencies, task fingerprints
  • Board memory, board group memory, onboarding

What it LACKS that the others have:

  • No real-time cost/token tracking
  • No system health monitoring (CPU/RAM/disk)
  • No cron job visualization
  • No session/sub-agent activity monitoring
  • No AI chat for asking about your deployment
  • No pixel-art agent visualization
  • No hardware monitoring
  • No service controls (start/stop/restart gateway)

Dashboard (openclaw-dashboard) — What We Pull

Data Collection (Go):

  • refresh.go — main collector, reads OpenClaw filesystem + gateway API
  • refresh_sessions.go — session listing, model resolution
  • refresh_tokens.go — token usage tracking
  • cron_state — cron job parsing and status
  • system.go — CPU, RAM, swap, disk, gateway runtime probes

API Endpoints:

  • /api/refresh — stale-while-revalidate data.json
  • /api/chat — AI chat via OpenClaw gateway
  • /api/system — live host metrics
  • /api/logs — merged log tail
  • /api/errors — aggregated error feed

Frontend:

  • Pure HTML/CSS/JS (single index.html) — we'll rewrite as React components
  • State management: 7 plain objects (State, DataLayer, DirtyChecker, Renderer, Theme, Chat, App)
  • SVG chart rendering (cost trends, model breakdown, sub-agent activity)
  • 6 themes with 19 CSS color variables each

Integration Approach:

  • Port the Go data collection to Python services that hit the OpenClaw gateway API
  • Replace the embedded HTML frontend with React components in the Next.js app
  • Use the existing gateway RPC connection in Mission Control's backend
  • Add PostgreSQL models for tracking data (cost snapshots, cron states, session events)

Pixel Agents (openclaw-pixel-agents-dashboard) — What We Pull

Backend (Node/Express):

  • sessionWatcher.ts — tails JSONL session files, parses events
  • spawner.ts — spawns sub-agents via gateway API
  • services.ts — gateway service controls (start/stop/restart)
  • hardware.ts — hardware stats collection
  • openclawParser.ts — JSONL event parsing
  • WebSocket broadcasting to frontend

Frontend (React/Vite):

  • Pixel-art canvas renderer (OfficeCanvas.tsx, game loop, character sprites)
  • Activity bubbles, conversation heat overlays
  • Spawn chat panel, session info panel
  • Server rack (hardware monitor), breaker panel (service controls)
  • Ham radio (update checker), fire alarm (gateway restart)

Integration Approach:

  • Port JSONL session watcher to Python (watch OpenClaw session directory)
  • Move sub-agent spawning to use Mission Control's existing gateway RPC
  • Rebuild the pixel-art canvas as a React component within Next.js
  • Add WebSocket support to FastAPI for real-time agent events
  • Hardware stats collected via the gateway's health and status methods

Implementation Plan

Phase 1: Foundation Setup (Week 1)

1.1 — Fork and Stand Up Base

  • Fork abhi1693/openclaw-mission-control to our org
  • Stand up local dev environment (Docker Compose: Postgres + Redis + backend + frontend)
  • Verify all existing features work: auth, boards, tasks, agents, gateways, approvals
  • Document the data model and API surface

1.2 — Add Tracking Models (Backend)

  • Create new PostgreSQL models:
    • CostSnapshot — daily cost tracking per model/gateway
    • CronJobStatus — cron schedule, last/next run, duration, status
    • SessionEvent — session start/stop, model, tokens, context %
    • SubAgentRun — sub-agent spawn, cost, duration, status
    • SystemHealthMetric — CPU, RAM, disk, swap, gateway uptime
    • AlertRule — configurable alert thresholds
  • Create Alembic migration
  • Add CRUD API endpoints under /api/monitoring/

1.3 — Gateway Data Collection Service

  • Create app/services/monitoring/gateway_collector.py
  • Reuse existing gateway_rpc.py to poll:
    • usage.cost — cost data
    • usage.status — token counts
    • cron.list / cron.status — cron jobs
    • sessions.list / sessions.preview — sessions
    • agents.list — agents
    • health — gateway health
    • status — gateway runtime status
  • Run as background task (asyncio) with configurable intervals
  • Store collected data in the new models

Phase 2: Tracking Dashboard (Week 2)

2.1 — Monitoring Pages (Frontend)

  • New Next.js routes:
    • /monitoring — main dashboard (cost cards, system health, alerts)
    • /monitoring/costs — detailed cost breakdown with charts
    • /monitoring/sessions — active sessions, sub-agent activity
    • /monitoring/crons — cron job management
    • /monitoring/system — CPU/RAM/disk/gateway health

2.2 — Cost Tracking UI

  • Port dashboard's cost cards and donut chart to React/Recharts
  • Today's cost, all-time cost, projected monthly
  • Per-model cost breakdown (7d/30d/all-time tabs)
  • Cost trend line chart (SVG → Recharts)

2.3 — Session & Sub-Agent UI

  • Active sessions with model, type badges (DM/group/cron/subagent)
  • Context % bars, token counts
  • Sub-agent activity grid with cost/duration/status
  • Session detail panel with conversation preview

2.4 — Cron Job Management

  • Cron job list with schedule, status, last/next run
  • Run history with duration and status badges
  • Trigger manual run from UI
  • Add/edit/delete cron jobs (using existing gateway RPC)

2.5 — System Health

  • Gateway status card (uptime, PID, memory, compaction)
  • CPU/RAM/swap/disk gauge cards (configurable thresholds)
  • Alert banner for high cost, failed crons, gateway offline
  • Auto-refresh with countdown timer

2.6 — AI Chat Panel

  • Port dashboard's AI chat to React component
  • Uses OpenClaw gateway's /v1/chat/completions endpoint
  • Context-aware: feed live monitoring data into system prompt
  • Persistent chat history per user

Phase 3: Agent Visualization (Week 3)

3.1 — Pixel Agent Canvas

  • Port the pixel-art office scene to React (Canvas component)
  • Agent sprites with activity state (working, idle, talking)
  • Activity bubbles showing current task/conversation
  • Conversation heat glow based on recent activity
  • Day/night ambient cycle
  • Pan/zoom controls (touch + mouse)

3.2 — Real-Time Agent Events

  • Add FastAPI WebSocket endpoint (/ws/agents)
  • Port JSONL session watcher to Python:
    • Watch ~/.openclaw/agents/*/sessions/*.jsonl
    • Parse events (tool calls, responses, status changes)
    • Broadcast to connected WebSocket clients
  • Activity ticker component (recent agent actions scrolling by)

3.3 — Sub-Agent Spawner

  • Spawn panel integrated into the canvas view
  • Click agent → "Spawn sub-agent" button
  • Mini-chat for tasking the sub-agent
  • Session info panel for active sub-agents
  • Uses existing agents.create gateway RPC

3.4 — Hardware Monitor & Service Controls

  • Server rack component (CPU/GPU/RAM/disk/network gauges)
  • Breaker panel for gateway start/stop/restart
  • Ham radio component for OpenClaw update checking
  • All using existing gateway RPC methods (health, status, update.run)

Phase 4: Integration & Polish (Week 4)

4.1 — Navigation Integration

  • Add "Monitoring" and "Agents" sections to Mission Control sidebar
  • Dashboard home page shows summary cards (cost, health, agent count)
  • Deep links from monitoring → agents → pixel view

4.2 — Theme System

  • Port the 6 dashboard themes into Mission Control's Tailwind config
  • Theme picker in header (persists via localStorage)
  • Glass morphism effects where appropriate

4.3 — Alert System

  • Configurable alert rules (cost threshold, cron failure, context %, memory)
  • Alert banner on every page when active
  • Alert history in activity feed
  • Notification delivery via webhooks or in-app

4.4 — Data Sync Strategy

  • Primary: Gateway RPC polling (configurable intervals)
  • Secondary: JSONL file watching for real-time agent events
  • Tertiary: REST API for manual refresh
  • WebSocket push for live updates to connected browsers
  • Stale-while-revalidate caching pattern

File Structure (Additions to Mission Control)

backend/
├── app/
│   ├── models/
│   │   ├── monitoring.py          # CostSnapshot, CronJobStatus, SessionEvent, etc.
│   │   └── alert_rules.py         # AlertRule model
│   ├── api/
│   │   ├── monitoring.py          # Cost, session, cron endpoints
│   │   ├── monitoring_system.py   # System health endpoints
│   │   └── agent_events.py        # WebSocket endpoint for agent events
│   └── services/
│       ├── monitoring/
│       │   ├── gateway_collector.py   # Polls OpenClaw gateway for data
│       │   ├── jsonl_watcher.py       # Watches session JSONL files
│       │   ├── cost_tracker.py        # Cost aggregation and projection
│       │   └── alert_engine.py        # Alert rule evaluation
│       └── openclaw/
│           └── (existing — no changes needed)
├── migrations/
│   └── versions/
│       └── xxx_add_monitoring_models.py
frontend/
├── src/
│   ├── app/
│   │   ├── monitoring/
│   │   │   ├── page.tsx              # Main monitoring dashboard
│   │   │   ├── costs/page.tsx        # Cost detail page
│   │   │   ├── sessions/page.tsx     # Session detail page
│   │   │   ├── crons/page.tsx        # Cron management page
│   │   │   └── system/page.tsx       # System health page
│   │   └── agents/
│   │       └── pixel/page.tsx        # Pixel agent canvas page
│   ├── components/
│   │   ├── monitoring/
│   │   │   ├── CostCards.tsx
│   │   │   ├── CostTrendChart.tsx
│   │   │   ├── ModelBreakdownChart.tsx
│   │   │   ├── SessionTable.tsx
│   │   │   ├── SubAgentActivity.tsx
│   │   │   ├── CronJobList.tsx
│   │   │   ├── SystemHealthCards.tsx
│   │   │   ├── AlertBanner.tsx
│   │   │   └── AiChatPanel.tsx
│   │   ├── agents/
│   │   │   ├── PixelCanvas.tsx
│   │   │   ├── AgentSprite.tsx
│   │   │   ├── ActivityBubble.tsx
│   │   │   ├── ConversationHeat.tsx
│   │   │   ├── SpawnPanel.tsx
│   │   │   ├── ServerRack.tsx
│   │   │   └── BreakerPanel.tsx
│   │   └── (existing Mission Control components)
│   └── lib/
│       ├── monitoring-api.ts         # API client for monitoring endpoints
│       └── agent-events.ts           # WebSocket client for agent events

Key Integration Points

Gateway Communication

All three projects talk to the OpenClaw gateway. Mission Control already has the richest integration (gateway_rpc.py with 60+ methods). We reuse this for everything:

Feature Gateway Methods Used
Cost tracking usage.cost, usage.status
Session monitoring sessions.list, sessions.preview
Cron management cron.list, cron.status, cron.add, cron.update, cron.remove, cron.run
Agent management agents.list, agents.create, agents.update, agents.delete
System health health, status, logs.tail
Sub-agent spawning agents.create, sessions.patch
Service controls config.get, config.set, update.run

Real-Time Updates

  • Dashboard uses polling (60s auto-refresh)
  • Pixel agents uses WebSocket (real-time JSONL events)
  • Mission Control uses TanStack Query (polling + cache invalidation)

Our approach: WebSocket for agent events (real-time pixel animation), TanStack Query with 30s polling for monitoring data, SSE for alerts.

Auth

  • Mission Control supports Clerk JWT and local bearer token
  • Dashboard is auth-free (localhost only)
  • Pixel agents uses gateway token

Our approach: Inherit Mission Control's auth system. Local mode for self-hosted, Clerk for multi-tenant. Monitoring and agent data scoped to organization + gateway.


Dependency Summary

Layer Technology Source
Backend framework FastAPI + SQLModel Mission Control
Database PostgreSQL + Alembic Mission Control
Job queue Redis + RQ Mission Control
Frontend framework Next.js 16 + React 19 Mission Control
UI primitives Radix UI + Tailwind Mission Control
Charts Recharts (existing) Mission Control
Pixel canvas HTML5 Canvas (new) Pixel Agents → React port
WebSocket FastAPI WebSocket (new) Pixel Agents → Python port
Auth Clerk / local bearer token Mission Control
Gateway RPC websockets Python (existing) Mission Control

No new backend languages. Go and Node/Express are NOT added — their functionality ports to Python services within the existing FastAPI app.


Risk Assessment

Risk Impact Mitigation
Canvas rendering performance in React Medium Use useRef + requestAnimationFrame, not React state for animation
Go dashboard data collection rewritten in Python Medium Port logic faithfully; test against same OpenClaw data
JSONL file watching reliability Medium Use watchdog library + fallback polling
Theme system merge (6 themes × 2 systems) Low Map dashboard's 19 CSS vars to Tailwind config
Pixel assets licensing Low MIT licensed, attribution in ASSET-LICENSE.md
Gateway RPC version compatibility Low Already handled by protocol version negotiation in gateway_rpc.py

Success Metrics

  1. All monitoring features from dashboard available in Mission Control UI
  2. Pixel agent visualization showing real-time agent activity
  3. Single Docker Compose brings up the entire system
  4. Single auth system — no separate logins
  5. Single gateway connection — reused across all features
  6. No Go or Node backend — everything in Python/FastAPI
  7. All existing Mission Control features still work (boards, tasks, approvals, etc.)