docs: rewrite README to reflect actual current project state

2026-05-10 22:27:45 -05:00 · 2026-05-10 22:27:45 -05:00 · fd7d0aca42
parent 504d4e4eb5
commit fd7d0aca42
1 changed files with 151 additions and 132 deletions
--- a/README.md
+++ b/README.md
@ -1,182 +1,201 @@
 # Mission Control

-A monitoring dashboard for [OpenClaw](https://github.com/openclaw/openclaw) AI agents — built with Python/FastAPI and Next.js.
+A monitoring dashboard for [OpenClaw](https://github.com/openclaw/openclaw) gateways — built with Python/FastAPI and Next.js.

-## Why This Exists
+## What It Does

-When you run OpenClaw seriously — multiple agents, dozens of cron jobs, sub-agents spawning sub-agents, several Telegram groups and WhatsApp, Slack, and Discord channels, 10+ models — information gets scattered fast.
+Mission Control connects to your OpenClaw gateways via their RPC endpoints and gives you a single place to see:

-**The problem:** there was no single place to answer the obvious questions:
+- **How much you're spending** — per-model cost breakdowns, cost trends, which model is burning your budget
+- **Whether your gateway is healthy** — online/offline status, CPU, RAM, disk, uptime
+- **What your cron jobs are doing** — schedules, last run, next fire, failures
+- **Which sessions are active** — model, type, context usage, token counts
+- **What your sub-agents are up to** — cost, duration, status at a glance
+- **How costs are trending** — daily charts, 7d/30d comparisons, acceleration signals

- **Is my gateway actually running right now?**
- **How much have I spent today, and which model is burning the most?**
- **Which cron jobs ran, which failed, and when does the next one fire?**
- **What sessions are active and how much context are they consuming?**
- **Are my sub-agents doing useful work or spinning in circles?**
- **What's the cost trend over the last 7 days — am I accelerating?**
+It polls your gateways in the background, stores everything in PostgreSQL, and serves it through REST endpoints ready for the frontend dashboard.

-**The solution:** a single dashboard that collects everything in one place — gateway health, costs, cron status, active sessions, sub-agent runs, model usage — refreshed automatically, org-scoped, no login required for local development. Open a browser tab, get the full picture in seconds.
+## Current Status

-It's not trying to replace the OpenClaw CLI or Telegram interface. It's the at-a-glance overview layer that tells you whether everything is healthy and where your money and compute are going — so you can make decisions without hunting for data.
+**Version:** 0.0.4 (dev branch)  
+**Phase:** Phase 2 — backend monitoring collection and API endpoints are live. Frontend dashboard panels are next.

-## Features
+### What's Working

-### 6 Core Monitoring Panels
+- **Backend API** — 97+ endpoints across boards, agents, gateways, tasks, organizations, approvals, and more (forked from base platform)
+- **Gateway data collection** — background service polls `usage.cost`, `cron.list`, `sessions.list`, `sessions.preview`, `health`, and `status` RPC endpoints and upserts into PostgreSQL
+- **7 monitoring models** — CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric, AlertRule, AlertEvent
+- **10 CRUD monitoring endpoints** — paginated, org-scoped read endpoints for all monitoring models
+- **2 summary endpoints** — cost-summary and cost-breakdown (latest snapshot per gateway, per-model percentage breakdown)
+- **Data processing functions** — `ModelName()`, `BuildDailyChart()`, `BuildAlerts()`, `BuildCostBreakdown()`, `FmtTokens()` ported from Go
+- **Event parser** — `parse_session_event()` and `format_tool_status()` ported from TypeScript
+- **WebSocket** — `/ws/agents` with initial snapshot (last 50 events) + background polling
+- **Auth** — Clerk for production, local token auth for dev (`AUTH_MODE=local`)
+- **Next.js frontend** — boards, tasks, agents, gateways, organizations, approvals pages (base platform UI)

-1. **💰 Cost Cards & Breakdown** — Today's cost, all-time cost, projected monthly, per-model cost breakdown with 7d/30d/all-time tabs. Know exactly which model is burning your budget.
-2. **💚 System Health** — Gateway status (online/offline), PID, uptime, memory, compaction mode, CPU/RAM/swap/disk gauges. See at a glance whether your gateway is healthy.
-3. **⏰ Cron Jobs** — All scheduled jobs with status, schedule, last/next run, duration, model. Spot failures instantly and see when the next fire is.
-4. **📡 Active Sessions** — Recent sessions with model, type badges (DM/group/cron/subagent), context % bars, token counts. See who's consuming what.
-5. **🤖 Sub-Agent Activity** — Sub-agent runs with cost, duration, status + token breakdown (7d/30d tabs). Know whether sub-agents are productive or spinning.
-6. **📈 Cost Trends** — Cost trend line over 7d/30d, model cost breakdown bars, acceleration indicators. Catch spending spikes before they hurt.
+### What's Not Yet Built

-### Architecture
+These are the remaining monitoring summary endpoints (tracked in FUTURE.md):

- **Backend:** Python/FastAPI + PostgreSQL + Redis
- **Frontend:** Next.js 16 + React 19 + Tailwind CSS + shadcn/ui
- **Data collection:** Background gateway collector polling OpenClaw RPC endpoints
- **Real-time:** WebSocket endpoint for live agent events
- **Data processing:** Pure Python functions ported from Go dashboard logic (model name normalization, daily chart aggregation, alert computation, token formatting)
+- Health summary (`/api/v1/monitoring/health-summary`)
+- Cron summary (`/api/v1/monitoring/cron-summary`)
+- Sessions summary (`/api/v1/monitoring/sessions-summary`)
+- Sub-agents summary (`/api/v1/monitoring/sub-agents-summary`)
+- Cost trends (`/api/v1/monitoring/trends`)

-### API Endpoints
+And the frontend dashboard panels that consume these endpoints.

-#### Monitoring Summary Endpoints
+## Architecture

-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/v1/monitoring/cost-summary` | GET | Today's cost, all-time cost, projected monthly |
-| `/api/v1/monitoring/cost-breakdown` | GET | Per-model cost breakdown (7d/30d/all) |
-| `/api/v1/monitoring/health-summary` | GET | Gateway status, system metrics, health gauges |
-| `/api/v1/monitoring/cron-summary` | GET | Cron job statuses, schedules, run history |
-| `/api/v1/monitoring/sessions-summary` | GET | Active sessions with model, context %, tokens |
-| `/api/v1/monitoring/sub-agents-summary` | GET | Sub-agent runs with cost, duration, status |
-| `/api/v1/monitoring/trends` | GET | Cost trends, model breakdown (7d/30d) |
+```
+OpenClaw Gateway(s)
+       │
+       │ RPC (usage.cost, cron.list, sessions.list, health, status)
+       ▼
+GatewayCollectorService          ← background asyncio task
+       │
+       │ upsert into PostgreSQL
+       ▼
+Monitoring Models                ← CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric, AlertRule, AlertEvent
+       │
+       │ API endpoints + data_processing transforms
+       ▼
+REST API (/api/v1/monitoring/*) ← cost-summary, cost-breakdown, CRUD endpoints, more coming
+       │
+       │ WebSocket (/ws/agents)
+       ▼
+Next.js Frontend                 ← boards, tasks, agents + upcoming monitoring dashboard
+```

-#### Monitoring CRUD Endpoints
+**Stack:**
+- Backend: Python 3.12 / FastAPI / SQLModel / PostgreSQL / Redis
+- Frontend: Next.js 16 / React 19 / Tailwind CSS / shadcn/ui / TanStack React Query
+- Auth: Clerk (production) or local token (dev)
+- API client: orval-generated TypeScript hooks

-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/v1/monitoring/cost-snapshots` | GET | Paginated cost snapshot records |
-| `/api/v1/monitoring/cron-jobs` | GET | Paginated cron job status records |
-| `/api/v1/monitoring/sessions` | GET | Paginated session event records |
-| `/api/v1/monitoring/health` | GET | Paginated system health metrics |
-| `/api/v1/monitoring/sub-agents` | GET | Paginated sub-agent run records |
+**Source repos** (for reference, not imported):
+- `mudrii/openclaw-dashboard` (Go) — dashboard panels and data processing logic
+- `jaffer1979/openclaw-pixel-agents-dashboard` (Node/Express) — session watching and event parsing

-#### WebSocket
+All Go/Node logic is ported to Python. No new backend languages.
+
+## API Reference
+
+### Monitoring Endpoints
+
+| Endpoint | Method | Description | Status |
+|----------|--------|-------------|--------|
+| `/api/v1/monitoring/cost-summary` | GET | Cost overview per gateway (latest snapshot) | ✅ Live |
+| `/api/v1/monitoring/cost-breakdown` | GET | Per-model cost breakdown ranked by spend | ✅ Live |
+| `/api/v1/monitoring/cost-snapshots` | GET | Paginated cost snapshot records | ✅ Live |
+| `/api/v1/monitoring/cron-jobs` | GET | Paginated cron job status records | ✅ Live |
+| `/api/v1/monitoring/sessions` | GET | Paginated session event records | ✅ Live |
+| `/api/v1/monitoring/health` | GET | Paginated system health metrics | ✅ Live |
+| `/api/v1/monitoring/sub-agents` | GET | Paginated sub-agent run records | ✅ Live |
+| `/api/v1/monitoring/health-summary` | GET | Gateway health overview | 🔜 Pending |
+| `/api/v1/monitoring/cron-summary` | GET | Cron jobs overview | 🔜 Pending |
+| `/api/v1/monitoring/sessions-summary` | GET | Active sessions overview | 🔜 Pending |
+| `/api/v1/monitoring/sub-agents-summary` | GET | Sub-agent activity overview | 🔜 Pending |
+| `/api/v1/monitoring/trends` | GET | Cost trend charts (7d/30d) | 🔜 Pending |
+
+All endpoints support `?gateway_id=` filtering and are org-scoped via `require_org_member`.
+
+### WebSocket

 | Endpoint | Description |
 |----------|-------------|
-| `/ws/agents` | Real-time agent events (initial snapshot + polling) |
+| `/ws/agents` | Real-time agent events (initial 50-event snapshot + 2s polling) |

-#### Gateway RPC Integration
+### Platform Endpoints (97+)

-The collector service polls these OpenClaw gateway RPC endpoints:
- `usage.cost` + `usage.status` → Cost snapshots
- `cron.list` → Cron job status
- `sessions.list` + `sessions.preview` → Session events
- `health` + `status` → System health metrics
+The base platform provides full CRUD for: boards, tasks, agents, gateways, organizations, approvals, board groups, board memory, board webhooks, tags, skills marketplace, and more. See the OpenAPI docs at `/docs` when running.

 ## Quick Start

-### Docker Compose (Recommended)
-
 ```bash
-git clone https://forgejo/null/Mission-Control.git
+git clone ssh://forgejo/null/Mission-Control.git
 cd Mission-Control
-cp .env.example .env
+cp .env.example .env   # edit LOCAL_AUTH_TOKEN and other vars
 docker compose up -d
 ```

-The backend runs on port 8080, frontend on port 3037.
+Backend runs on port 8080, frontend on 3037, PostgreSQL on 5432, Redis on 6379.

 ### Environment Variables

 | Variable | Default | Description |
 |----------|---------|-------------|
-| `COLLECTION_INTERVAL_COST` | 300 | Seconds between cost collection |
-| `COLLECTION_INTERVAL_CRON` | 60 | Seconds between cron collection |
-| `COLLECTION_INTERVAL_SESSION` | 30 | Seconds between session collection |
-| `COLLECTION_INTERVAL_HEALTH` | 60 | Seconds between health collection |
-| `LOCAL_AUTH_TOKEN` | — | Token for local dev auth |
-| `POSTGRES_HOST` | db | PostgreSQL host |
-| `POSTGRES_PORT` | 5432 | PostgreSQL port |
-| `POSTGRES_DB` | mission_control | Database name |
-| `REDIS_URL` | redis://redis:6379/0 | Redis connection URL |
+| `AUTH_MODE` | `local` | `local` for token auth, `clerk` for production |
+| `LOCAL_AUTH_TOKEN` | — | Required when `AUTH_MODE=local` |
+| `BACKEND_PORT` | `8000` | Backend API port |
+| `FRONTEND_PORT` | `3037` | Frontend dev server port |
+| `POSTGRES_DB` | `mission_control` | Database name |
+| `POSTGRES_USER` | `postgres` | Database user |
+| `POSTGRES_PASSWORD` | `postgres` | Database password |
+| `POSTGRES_PORT` | `5432` | PostgreSQL port |
+| `DB_AUTO_MIGRATE` | `true` | Run Alembic migrations on startup |
+| `CORS_ORIGINS` | `http://localhost:3037` | Allowed CORS origins |
+| `COLLECTION_INTERVAL_COST` | `300` | Seconds between cost collection |
+| `COLLECTION_INTERVAL_CRON` | `60` | Seconds between cron collection |
+| `COLLECTION_INTERVAL_SESSION` | `30` | Seconds between session collection |
+| `COLLECTION_INTERVAL_HEALTH` | `60` | Seconds between health collection |
+| `RQ_REDIS_URL` | `redis://redis:6379/0` | Redis URL for webhook worker |

 ## Project Structure

 ```
 Mission-Control/
-├── src/
-│   ├── backend/
-│   │   ├── app/
-│   │   │   ├── api/           # API routes (monitoring, ws, gateways, etc.)
-│   │   │   ├── models/        # SQLModel database models
-│   │   │   ├── schemas/       # Pydantic request/response schemas
-│   │   │   ├── services/
-│   │   │   │   └── monitoring/
-│   │   │   │       ├── gateway_collector.py  # Background RPC collector
-│   │   │   │       ├── data_processing.py    # Dashboard data transforms
+├── compose.yml                    # Docker Compose (db, redis, backend, frontend, webhook-worker)
+├── .env                           # Environment config
+├── PROJECT.md                      # 4-phase implementation plan
+├── STRUCTURE.md                    # Agent roles and project structure
+├── FUTURE.md                       # Prioritized backlog
+├── VERSION.md                      # Version history
+├── DEVELOPMENT_LOG.md              # Agent work tracking
+├── HISTORY.md                      # Changelog
+├── sources/                        # Reference repos (Go, Node) — not imported
+│   ├── dashboard-tracking/         # mudrii/openclaw-dashboard (Go)
+│   └── pixel-agents/               # jaffer1979/openclaw-pixel-agents-dashboard (Node)
+└── src/
+    ├── backend/
+    │   ├── app/
+    │   │   ├── api/                # API routes (monitoring, boards, agents, gateways, etc.)
+    │   │   ├── core/               # Config, auth, logging, rate limiting, security
+    │   │   ├── db/                 # Session, pagination, query manager
+    │   │   ├── models/             # SQLModel database models (30+ tables)
+    │   │   ├── schemas/            # Pydantic request/response schemas
+    │   │   ├── services/
+    │   │   │   ├── monitoring/      # Collector, data processing, event parser, RPC models
+    │   │   │   │   ├── gateway_collector.py   # Background RPC poller
+    │   │   │   │   ├── data_processing.py     # ModelName, BuildDailyChart, BuildCostBreakdown, etc.
    │   │   │   │   ├── event_parser.py         # Session event parser
    │   │   │   │   └── models.py               # Pydantic RPC response models
-│   │   │   └── main.py        # FastAPI app + lifespan
-│   │   ├── migrations/        # Alembic migrations
-│   │   └── tests/
-│   └── frontend/              # Next.js app
-│       └── src/
-│           ├── app/            # Next.js App Router pages
-│           ├── components/     # React components
-│           ├── api/            # Generated API clients
-│           └── lib/            # Utilities
-├── sources/                    # Reference repos (Go, Node)
-├── docker-compose.yml
-├── Dockerfile
-└── PROJECT.md                  # Full 4-phase implementation plan
+    │   │   │   └── openclaw/       # Gateway RPC, provisioning, lifecycle, coordination
+    │   │   └── main.py             # FastAPI app + lifespan (collector start/stop)
+    │   ├── migrations/              # Alembic migrations
+    │   ├── scripts/                 # Seed, export, sync scripts
+    │   └── tests/                   # pytest test suite
+    └── frontend/
+        └── src/
+            ├── app/                 # Next.js App Router pages
+            ├── components/          # React components (atoms/molecules/organisms/templates)
+            ├── api/                 # orval-generated TypeScript API client
+            ├── auth/                # Clerk + local auth
+            ├── hooks/               # Custom React hooks
+            ├── lib/                 # Utilities
+            └── proxy.ts            # Dev proxy config
 ```

-## Data Collection Flow
+## Git Workflow

-```
-OpenClaw Gateway
-       │
-       │ RPC (usage.cost, cron.list, sessions.list, health, status)
-       ▼
-GatewayCollectorService (background asyncio task)
-       │
-       │ Upsert into PostgreSQL
-       ▼
-Monitoring Models (CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric)
-       │
-       │ API endpoints + data_processing transforms
-       ▼
-Dashboard Frontend (Next.js)
-       │
-       │ WebSocket for real-time events
-       ▼
-Live Agent Activity Panel
-```
-
-## Source Repos
-
-Mission Control ports functionality from two OpenClaw dashboard projects:
-
- **[openclaw-dashboard](https://github.com/mudrii/openclaw-dashboard)** (Go) — Dashboard panels, data processing logic, alert computation
- **[openclaw-pixel-agents-dashboard](https://github.com/jaffer1979/openclaw-pixel-agents-dashboard)** (Node/Express) — Pixel agent visualization, session watching, event parsing
-
-**Key decision:** No new backend languages. Go and Node functionality ports to Python/FastAPI within Mission Control's backend. We reuse the gateway RPC transport and data model shapes, but port all processing/aggregation logic as pure Python functions.
-
-## Development
+- **`main`** — stable/release branch
+- **`dev`** — working branch (all development happens here)

 ```bash
-# Backend
-cd src/backend
-pip install -r requirements.txt
-uvicorn app.main:app --reload --port 8080
-
-# Frontend
-cd src/frontend
-npm install
-npm run dev
+git checkout dev
+# ... make changes ...
+git add -A && git commit -m "type: description"
+git push origin dev
 ```

 ## License