396 lines
17 KiB
Markdown
396 lines
17 KiB
Markdown
# 🎯 Mission Control — Project Plan
|
||
|
||
> Merge three OpenClaw dashboards into a single, unified Mission Control platform.
|
||
|
||
---
|
||
|
||
## Source Repos
|
||
|
||
| Repo | Purpose | Stack | Key Assets |
|
||
|---|---|---|---|
|
||
| [abhi1693/openclaw-mission-control](https://github.com/abhi1693/openclaw-mission-control) | **Base platform** — work orchestration, governance, gateway management | Python/FastAPI + PostgreSQL + Redis + Next.js (React 19) + Clerk auth + Docker Compose | Organizations, boards, tasks, tags, approvals, agents, gateways, webhooks, activity feed, skills marketplace |
|
||
| [mudrii/openclaw-dashboard](https://github.com/mudrii/openclaw-dashboard) | **Tracking layer** — real-time metrics, costs, crons, sessions, system health | Go binary (zero deps) + embedded HTML/JS + SVG charts | Cost cards, cron status, session tracking, sub-agent activity, AI chat, system metrics (CPU/RAM/disk), 6 themes, alerts, token usage |
|
||
| [jaffer1979/openclaw-pixel-agents-dashboard](https://github.com/jaffer1979/openclaw-pixel-agents-dashboard) | **Agent visualization** — pixel-art agent sprites, real-time activity | Node/Express + Vite + React 19 + Canvas/WebSocket + JSONL parsing | Agent sprites with activity bubbles, conversation heat, spawn sub-agents, hardware monitor, service controls, day/night cycle |
|
||
|
||
---
|
||
|
||
## Architecture Decision: What to Merge Into What
|
||
|
||
**Base: openclaw-mission-control** — this becomes the foundation because:
|
||
- It has the richest data model (organizations, boards, tasks, approvals, agents, gateways, webhooks)
|
||
- It has proper auth (Clerk or local bearer token)
|
||
- It has a full API layer (FastAPI with SQLModel/SQLAlchemy)
|
||
- It has multi-tenancy built in
|
||
- It has the most mature frontend (Next.js 16 + React 19 + TanStack Query + Recharts)
|
||
|
||
**Merge FROM dashboard** — extract the tracking/monitoring features:
|
||
- Cost tracking, token usage, model breakdown
|
||
- Cron job status, scheduling, last/next run
|
||
- Session tracking, sub-agent activity
|
||
- System health (CPU, RAM, disk, gateway status)
|
||
- AI chat panel (ask questions about your data)
|
||
- Alert system (high cost, failed crons, context usage)
|
||
- 6 themes + glass morphism UI
|
||
|
||
**Merge FROM pixel-agents** — extract the agent visualization:
|
||
- Pixel-art agent sprites in a shared office scene
|
||
- Real-time activity bubbles, conversation heat
|
||
- Sub-agent spawning from the UI
|
||
- Hardware monitor (CPU/GPU/RAM/disk/network)
|
||
- Service controls (start/stop/restart gateway)
|
||
- Day/night cycle ambient lighting
|
||
|
||
---
|
||
|
||
## Technical Analysis
|
||
|
||
### Base Platform (openclaw-mission-control)
|
||
|
||
**Backend:**
|
||
- Python 3.12+, FastAPI, SQLModel/SQLAlchemy, PostgreSQL, Redis
|
||
- Alembic migrations, RQ worker for webhooks
|
||
- Full OpenClaw gateway integration via WebSocket RPC (device pairing, control UI)
|
||
- Gateway methods: 60+ RPC calls for sessions, agents, cron, config, exec approvals, etc.
|
||
- Auth: Clerk JWT or local bearer token (≥50 chars)
|
||
|
||
**Frontend:**
|
||
- Next.js 16.1.7, React 19.2, TanStack Query v5, TanStack Table v8
|
||
- Radix UI primitives, Tailwind CSS, Recharts, React Markdown
|
||
- 40+ page routes (dashboard, boards, agents, approvals, gateways, skills, tags, etc.)
|
||
- Cypress E2E tests
|
||
|
||
**Data Model (27 tables):**
|
||
- Organizations, users, boards, board_groups, tasks, tags, approvals
|
||
- Agents, gateways, activity_events, board_webhooks, skills
|
||
- Custom fields, task dependencies, task fingerprints
|
||
- Board memory, board group memory, onboarding
|
||
|
||
**What it LACKS that the others have:**
|
||
- No real-time cost/token tracking
|
||
- No system health monitoring (CPU/RAM/disk)
|
||
- No cron job visualization
|
||
- No session/sub-agent activity monitoring
|
||
- No AI chat for asking about your deployment
|
||
- No pixel-art agent visualization
|
||
- No hardware monitoring
|
||
- No service controls (start/stop/restart gateway)
|
||
|
||
### Dashboard (openclaw-dashboard) — What We Pull
|
||
|
||
**Data Collection (Go):**
|
||
- `refresh.go` — main collector, reads OpenClaw filesystem + gateway API
|
||
- `refresh_sessions.go` — session listing, model resolution
|
||
- `refresh_tokens.go` — token usage tracking
|
||
- `cron_state` — cron job parsing and status
|
||
- `system.go` — CPU, RAM, swap, disk, gateway runtime probes
|
||
|
||
**API Endpoints:**
|
||
- `/api/refresh` — stale-while-revalidate data.json
|
||
- `/api/chat` — AI chat via OpenClaw gateway
|
||
- `/api/system` — live host metrics
|
||
- `/api/logs` — merged log tail
|
||
- `/api/errors` — aggregated error feed
|
||
|
||
**Frontend:**
|
||
- Pure HTML/CSS/JS (single `index.html`) — we'll rewrite as React components
|
||
- State management: 7 plain objects (State, DataLayer, DirtyChecker, Renderer, Theme, Chat, App)
|
||
- SVG chart rendering (cost trends, model breakdown, sub-agent activity)
|
||
- 6 themes with 19 CSS color variables each
|
||
|
||
**Integration Approach:**
|
||
- Port the Go data collection to Python services that hit the OpenClaw gateway API
|
||
- Replace the embedded HTML frontend with React components in the Next.js app
|
||
- Use the existing gateway RPC connection in Mission Control's backend
|
||
- Add PostgreSQL models for tracking data (cost snapshots, cron states, session events)
|
||
|
||
### Pixel Agents (openclaw-pixel-agents-dashboard) — What We Pull
|
||
|
||
**Backend (Node/Express):**
|
||
- `sessionWatcher.ts` — tails JSONL session files, parses events
|
||
- `spawner.ts` — spawns sub-agents via gateway API
|
||
- `services.ts` — gateway service controls (start/stop/restart)
|
||
- `hardware.ts` — hardware stats collection
|
||
- `openclawParser.ts` — JSONL event parsing
|
||
- WebSocket broadcasting to frontend
|
||
|
||
**Frontend (React/Vite):**
|
||
- Pixel-art canvas renderer (`OfficeCanvas.tsx`, game loop, character sprites)
|
||
- Activity bubbles, conversation heat overlays
|
||
- Spawn chat panel, session info panel
|
||
- Server rack (hardware monitor), breaker panel (service controls)
|
||
- Ham radio (update checker), fire alarm (gateway restart)
|
||
|
||
**Integration Approach:**
|
||
- Port JSONL session watcher to Python (watch OpenClaw session directory)
|
||
- Move sub-agent spawning to use Mission Control's existing gateway RPC
|
||
- Rebuild the pixel-art canvas as a React component within Next.js
|
||
- Add WebSocket support to FastAPI for real-time agent events
|
||
- Hardware stats collected via the gateway's `health` and `status` methods
|
||
|
||
---
|
||
|
||
## Implementation Plan
|
||
|
||
### Phase 1: Foundation Setup (Week 1)
|
||
|
||
**1.1 — Fork and Stand Up Base**
|
||
- Fork `abhi1693/openclaw-mission-control` to our org
|
||
- Stand up local dev environment (Docker Compose: Postgres + Redis + backend + frontend)
|
||
- Verify all existing features work: auth, boards, tasks, agents, gateways, approvals
|
||
- Document the data model and API surface
|
||
|
||
**1.2 — Add Tracking Models (Backend)**
|
||
- Create new PostgreSQL models:
|
||
- `CostSnapshot` — daily cost tracking per model/gateway
|
||
- `CronJobStatus` — cron schedule, last/next run, duration, status
|
||
- `SessionEvent` — session start/stop, model, tokens, context %
|
||
- `SubAgentRun` — sub-agent spawn, cost, duration, status
|
||
- `SystemHealthMetric` — CPU, RAM, disk, swap, gateway uptime
|
||
- `AlertRule` — configurable alert thresholds
|
||
- Create Alembic migration
|
||
- Add CRUD API endpoints under `/api/monitoring/`
|
||
|
||
**1.3 — Gateway Data Collection Service**
|
||
- Create `app/services/monitoring/gateway_collector.py`
|
||
- Reuse existing `gateway_rpc.py` to poll:
|
||
- `usage.cost` — cost data
|
||
- `usage.status` — token counts
|
||
- `cron.list` / `cron.status` — cron jobs
|
||
- `sessions.list` / `sessions.preview` — sessions
|
||
- `agents.list` — agents
|
||
- `health` — gateway health
|
||
- `status` — gateway runtime status
|
||
- Run as background task (asyncio) with configurable intervals
|
||
- Store collected data in the new models
|
||
|
||
### Phase 2: Tracking Dashboard (Week 2)
|
||
|
||
**2.1 — Monitoring Pages (Frontend)**
|
||
- New Next.js routes:
|
||
- `/monitoring` — main dashboard (cost cards, system health, alerts)
|
||
- `/monitoring/costs` — detailed cost breakdown with charts
|
||
- `/monitoring/sessions` — active sessions, sub-agent activity
|
||
- `/monitoring/crons` — cron job management
|
||
- `/monitoring/system` — CPU/RAM/disk/gateway health
|
||
|
||
**2.2 — Cost Tracking UI**
|
||
- Port dashboard's cost cards and donut chart to React/Recharts
|
||
- Today's cost, all-time cost, projected monthly
|
||
- Per-model cost breakdown (7d/30d/all-time tabs)
|
||
- Cost trend line chart (SVG → Recharts)
|
||
|
||
**2.3 — Session & Sub-Agent UI**
|
||
- Active sessions with model, type badges (DM/group/cron/subagent)
|
||
- Context % bars, token counts
|
||
- Sub-agent activity grid with cost/duration/status
|
||
- Session detail panel with conversation preview
|
||
|
||
**2.4 — Cron Job Management**
|
||
- Cron job list with schedule, status, last/next run
|
||
- Run history with duration and status badges
|
||
- Trigger manual run from UI
|
||
- Add/edit/delete cron jobs (using existing gateway RPC)
|
||
|
||
**2.5 — System Health**
|
||
- Gateway status card (uptime, PID, memory, compaction)
|
||
- CPU/RAM/swap/disk gauge cards (configurable thresholds)
|
||
- Alert banner for high cost, failed crons, gateway offline
|
||
- Auto-refresh with countdown timer
|
||
|
||
**2.6 — AI Chat Panel**
|
||
- Port dashboard's AI chat to React component
|
||
- Uses OpenClaw gateway's `/v1/chat/completions` endpoint
|
||
- Context-aware: feed live monitoring data into system prompt
|
||
- Persistent chat history per user
|
||
|
||
### Phase 3: Agent Visualization (Week 3)
|
||
|
||
**3.1 — Pixel Agent Canvas**
|
||
- Port the pixel-art office scene to React (Canvas component)
|
||
- Agent sprites with activity state (working, idle, talking)
|
||
- Activity bubbles showing current task/conversation
|
||
- Conversation heat glow based on recent activity
|
||
- Day/night ambient cycle
|
||
- Pan/zoom controls (touch + mouse)
|
||
|
||
**3.2 — Real-Time Agent Events**
|
||
- Add FastAPI WebSocket endpoint (`/ws/agents`)
|
||
- Port JSONL session watcher to Python:
|
||
- Watch `~/.openclaw/agents/*/sessions/*.jsonl`
|
||
- Parse events (tool calls, responses, status changes)
|
||
- Broadcast to connected WebSocket clients
|
||
- Activity ticker component (recent agent actions scrolling by)
|
||
|
||
**3.3 — Sub-Agent Spawner**
|
||
- Spawn panel integrated into the canvas view
|
||
- Click agent → "Spawn sub-agent" button
|
||
- Mini-chat for tasking the sub-agent
|
||
- Session info panel for active sub-agents
|
||
- Uses existing `agents.create` gateway RPC
|
||
|
||
**3.4 — Hardware Monitor & Service Controls**
|
||
- Server rack component (CPU/GPU/RAM/disk/network gauges)
|
||
- Breaker panel for gateway start/stop/restart
|
||
- Ham radio component for OpenClaw update checking
|
||
- All using existing gateway RPC methods (`health`, `status`, `update.run`)
|
||
|
||
### Phase 4: Integration & Polish (Week 4)
|
||
|
||
**4.1 — Navigation Integration**
|
||
- Add "Monitoring" and "Agents" sections to Mission Control sidebar
|
||
- Dashboard home page shows summary cards (cost, health, agent count)
|
||
- Deep links from monitoring → agents → pixel view
|
||
|
||
**4.2 — Theme System**
|
||
- Port the 6 dashboard themes into Mission Control's Tailwind config
|
||
- Theme picker in header (persists via localStorage)
|
||
- Glass morphism effects where appropriate
|
||
|
||
**4.3 — Alert System**
|
||
- Configurable alert rules (cost threshold, cron failure, context %, memory)
|
||
- Alert banner on every page when active
|
||
- Alert history in activity feed
|
||
- Notification delivery via webhooks or in-app
|
||
|
||
**4.4 — Data Sync Strategy**
|
||
- Primary: Gateway RPC polling (configurable intervals)
|
||
- Secondary: JSONL file watching for real-time agent events
|
||
- Tertiary: REST API for manual refresh
|
||
- WebSocket push for live updates to connected browsers
|
||
- Stale-while-revalidate caching pattern
|
||
|
||
---
|
||
|
||
## File Structure (Additions to Mission Control)
|
||
|
||
```
|
||
backend/
|
||
├── app/
|
||
│ ├── models/
|
||
│ │ ├── monitoring.py # CostSnapshot, CronJobStatus, SessionEvent, etc.
|
||
│ │ └── alert_rules.py # AlertRule model
|
||
│ ├── api/
|
||
│ │ ├── monitoring.py # Cost, session, cron endpoints
|
||
│ │ ├── monitoring_system.py # System health endpoints
|
||
│ │ └── agent_events.py # WebSocket endpoint for agent events
|
||
│ └── services/
|
||
│ ├── monitoring/
|
||
│ │ ├── gateway_collector.py # Polls OpenClaw gateway for data
|
||
│ │ ├── jsonl_watcher.py # Watches session JSONL files
|
||
│ │ ├── cost_tracker.py # Cost aggregation and projection
|
||
│ │ └── alert_engine.py # Alert rule evaluation
|
||
│ └── openclaw/
|
||
│ └── (existing — no changes needed)
|
||
├── migrations/
|
||
│ └── versions/
|
||
│ └── xxx_add_monitoring_models.py
|
||
frontend/
|
||
├── src/
|
||
│ ├── app/
|
||
│ │ ├── monitoring/
|
||
│ │ │ ├── page.tsx # Main monitoring dashboard
|
||
│ │ │ ├── costs/page.tsx # Cost detail page
|
||
│ │ │ ├── sessions/page.tsx # Session detail page
|
||
│ │ │ ├── crons/page.tsx # Cron management page
|
||
│ │ │ └── system/page.tsx # System health page
|
||
│ │ └── agents/
|
||
│ │ └── pixel/page.tsx # Pixel agent canvas page
|
||
│ ├── components/
|
||
│ │ ├── monitoring/
|
||
│ │ │ ├── CostCards.tsx
|
||
│ │ │ ├── CostTrendChart.tsx
|
||
│ │ │ ├── ModelBreakdownChart.tsx
|
||
│ │ │ ├── SessionTable.tsx
|
||
│ │ │ ├── SubAgentActivity.tsx
|
||
│ │ │ ├── CronJobList.tsx
|
||
│ │ │ ├── SystemHealthCards.tsx
|
||
│ │ │ ├── AlertBanner.tsx
|
||
│ │ │ └── AiChatPanel.tsx
|
||
│ │ ├── agents/
|
||
│ │ │ ├── PixelCanvas.tsx
|
||
│ │ │ ├── AgentSprite.tsx
|
||
│ │ │ ├── ActivityBubble.tsx
|
||
│ │ │ ├── ConversationHeat.tsx
|
||
│ │ │ ├── SpawnPanel.tsx
|
||
│ │ │ ├── ServerRack.tsx
|
||
│ │ │ └── BreakerPanel.tsx
|
||
│ │ └── (existing Mission Control components)
|
||
│ └── lib/
|
||
│ ├── monitoring-api.ts # API client for monitoring endpoints
|
||
│ └── agent-events.ts # WebSocket client for agent events
|
||
```
|
||
|
||
---
|
||
|
||
## Key Integration Points
|
||
|
||
### Gateway Communication
|
||
All three projects talk to the OpenClaw gateway. Mission Control already has the richest integration (`gateway_rpc.py` with 60+ methods). We reuse this for everything:
|
||
|
||
| Feature | Gateway Methods Used |
|
||
|---|---|
|
||
| Cost tracking | `usage.cost`, `usage.status` |
|
||
| Session monitoring | `sessions.list`, `sessions.preview` |
|
||
| Cron management | `cron.list`, `cron.status`, `cron.add`, `cron.update`, `cron.remove`, `cron.run` |
|
||
| Agent management | `agents.list`, `agents.create`, `agents.update`, `agents.delete` |
|
||
| System health | `health`, `status`, `logs.tail` |
|
||
| Sub-agent spawning | `agents.create`, `sessions.patch` |
|
||
| Service controls | `config.get`, `config.set`, `update.run` |
|
||
|
||
### Real-Time Updates
|
||
- Dashboard uses polling (60s auto-refresh)
|
||
- Pixel agents uses WebSocket (real-time JSONL events)
|
||
- Mission Control uses TanStack Query (polling + cache invalidation)
|
||
|
||
**Our approach:** WebSocket for agent events (real-time pixel animation), TanStack Query with 30s polling for monitoring data, SSE for alerts.
|
||
|
||
### Auth
|
||
- Mission Control supports Clerk JWT and local bearer token
|
||
- Dashboard is auth-free (localhost only)
|
||
- Pixel agents uses gateway token
|
||
|
||
**Our approach:** Inherit Mission Control's auth system. Local mode for self-hosted, Clerk for multi-tenant. Monitoring and agent data scoped to organization + gateway.
|
||
|
||
---
|
||
|
||
## Dependency Summary
|
||
|
||
| Layer | Technology | Source |
|
||
|---|---|---|
|
||
| Backend framework | FastAPI + SQLModel | Mission Control |
|
||
| Database | PostgreSQL + Alembic | Mission Control |
|
||
| Job queue | Redis + RQ | Mission Control |
|
||
| Frontend framework | Next.js 16 + React 19 | Mission Control |
|
||
| UI primitives | Radix UI + Tailwind | Mission Control |
|
||
| Charts | Recharts (existing) | Mission Control |
|
||
| Pixel canvas | HTML5 Canvas (new) | Pixel Agents → React port |
|
||
| WebSocket | FastAPI WebSocket (new) | Pixel Agents → Python port |
|
||
| Auth | Clerk / local bearer token | Mission Control |
|
||
| Gateway RPC | websockets Python (existing) | Mission Control |
|
||
|
||
**No new backend languages.** Go and Node/Express are NOT added — their functionality ports to Python services within the existing FastAPI app.
|
||
|
||
---
|
||
|
||
## Risk Assessment
|
||
|
||
| Risk | Impact | Mitigation |
|
||
|---|---|---|
|
||
| Canvas rendering performance in React | Medium | Use `useRef` + `requestAnimationFrame`, not React state for animation |
|
||
| Go dashboard data collection rewritten in Python | Medium | Port logic faithfully; test against same OpenClaw data |
|
||
| JSONL file watching reliability | Medium | Use `watchdog` library + fallback polling |
|
||
| Theme system merge (6 themes × 2 systems) | Low | Map dashboard's 19 CSS vars to Tailwind config |
|
||
| Pixel assets licensing | Low | MIT licensed, attribution in ASSET-LICENSE.md |
|
||
| Gateway RPC version compatibility | Low | Already handled by protocol version negotiation in `gateway_rpc.py` |
|
||
|
||
---
|
||
|
||
## Success Metrics
|
||
|
||
1. **All monitoring features** from dashboard available in Mission Control UI
|
||
2. **Pixel agent visualization** showing real-time agent activity
|
||
3. **Single Docker Compose** brings up the entire system
|
||
4. **Single auth system** — no separate logins
|
||
5. **Single gateway connection** — reused across all features
|
||
6. **No Go or Node backend** — everything in Python/FastAPI
|
||
7. **All existing Mission Control features** still work (boards, tasks, approvals, etc.) |