diff --git a/README.md b/README.md index 5aacbf4..5677d2d 100644 --- a/README.md +++ b/README.md @@ -1,182 +1,201 @@ # Mission Control -A monitoring dashboard for [OpenClaw](https://github.com/openclaw/openclaw) AI agents — built with Python/FastAPI and Next.js. +A monitoring dashboard for [OpenClaw](https://github.com/openclaw/openclaw) gateways — built with Python/FastAPI and Next.js. -## Why This Exists +## What It Does -When you run OpenClaw seriously — multiple agents, dozens of cron jobs, sub-agents spawning sub-agents, several Telegram groups and WhatsApp, Slack, and Discord channels, 10+ models — information gets scattered fast. +Mission Control connects to your OpenClaw gateways via their RPC endpoints and gives you a single place to see: -**The problem:** there was no single place to answer the obvious questions: +- **How much you're spending** — per-model cost breakdowns, cost trends, which model is burning your budget +- **Whether your gateway is healthy** — online/offline status, CPU, RAM, disk, uptime +- **What your cron jobs are doing** — schedules, last run, next fire, failures +- **Which sessions are active** — model, type, context usage, token counts +- **What your sub-agents are up to** — cost, duration, status at a glance +- **How costs are trending** — daily charts, 7d/30d comparisons, acceleration signals -- **Is my gateway actually running right now?** -- **How much have I spent today, and which model is burning the most?** -- **Which cron jobs ran, which failed, and when does the next one fire?** -- **What sessions are active and how much context are they consuming?** -- **Are my sub-agents doing useful work or spinning in circles?** -- **What's the cost trend over the last 7 days — am I accelerating?** +It polls your gateways in the background, stores everything in PostgreSQL, and serves it through REST endpoints ready for the frontend dashboard. -**The solution:** a single dashboard that collects everything in one place — gateway health, costs, cron status, active sessions, sub-agent runs, model usage — refreshed automatically, org-scoped, no login required for local development. Open a browser tab, get the full picture in seconds. +## Current Status -It's not trying to replace the OpenClaw CLI or Telegram interface. It's the at-a-glance overview layer that tells you whether everything is healthy and where your money and compute are going — so you can make decisions without hunting for data. +**Version:** 0.0.4 (dev branch) +**Phase:** Phase 2 — backend monitoring collection and API endpoints are live. Frontend dashboard panels are next. -## Features +### What's Working -### 6 Core Monitoring Panels +- **Backend API** — 97+ endpoints across boards, agents, gateways, tasks, organizations, approvals, and more (forked from base platform) +- **Gateway data collection** — background service polls `usage.cost`, `cron.list`, `sessions.list`, `sessions.preview`, `health`, and `status` RPC endpoints and upserts into PostgreSQL +- **7 monitoring models** — CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric, AlertRule, AlertEvent +- **10 CRUD monitoring endpoints** — paginated, org-scoped read endpoints for all monitoring models +- **2 summary endpoints** — cost-summary and cost-breakdown (latest snapshot per gateway, per-model percentage breakdown) +- **Data processing functions** — `ModelName()`, `BuildDailyChart()`, `BuildAlerts()`, `BuildCostBreakdown()`, `FmtTokens()` ported from Go +- **Event parser** — `parse_session_event()` and `format_tool_status()` ported from TypeScript +- **WebSocket** — `/ws/agents` with initial snapshot (last 50 events) + background polling +- **Auth** — Clerk for production, local token auth for dev (`AUTH_MODE=local`) +- **Next.js frontend** — boards, tasks, agents, gateways, organizations, approvals pages (base platform UI) -1. **💰 Cost Cards & Breakdown** — Today's cost, all-time cost, projected monthly, per-model cost breakdown with 7d/30d/all-time tabs. Know exactly which model is burning your budget. -2. **💚 System Health** — Gateway status (online/offline), PID, uptime, memory, compaction mode, CPU/RAM/swap/disk gauges. See at a glance whether your gateway is healthy. -3. **⏰ Cron Jobs** — All scheduled jobs with status, schedule, last/next run, duration, model. Spot failures instantly and see when the next fire is. -4. **📡 Active Sessions** — Recent sessions with model, type badges (DM/group/cron/subagent), context % bars, token counts. See who's consuming what. -5. **🤖 Sub-Agent Activity** — Sub-agent runs with cost, duration, status + token breakdown (7d/30d tabs). Know whether sub-agents are productive or spinning. -6. **📈 Cost Trends** — Cost trend line over 7d/30d, model cost breakdown bars, acceleration indicators. Catch spending spikes before they hurt. +### What's Not Yet Built -### Architecture +These are the remaining monitoring summary endpoints (tracked in FUTURE.md): -- **Backend:** Python/FastAPI + PostgreSQL + Redis -- **Frontend:** Next.js 16 + React 19 + Tailwind CSS + shadcn/ui -- **Data collection:** Background gateway collector polling OpenClaw RPC endpoints -- **Real-time:** WebSocket endpoint for live agent events -- **Data processing:** Pure Python functions ported from Go dashboard logic (model name normalization, daily chart aggregation, alert computation, token formatting) +- Health summary (`/api/v1/monitoring/health-summary`) +- Cron summary (`/api/v1/monitoring/cron-summary`) +- Sessions summary (`/api/v1/monitoring/sessions-summary`) +- Sub-agents summary (`/api/v1/monitoring/sub-agents-summary`) +- Cost trends (`/api/v1/monitoring/trends`) -### API Endpoints +And the frontend dashboard panels that consume these endpoints. -#### Monitoring Summary Endpoints +## Architecture -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/api/v1/monitoring/cost-summary` | GET | Today's cost, all-time cost, projected monthly | -| `/api/v1/monitoring/cost-breakdown` | GET | Per-model cost breakdown (7d/30d/all) | -| `/api/v1/monitoring/health-summary` | GET | Gateway status, system metrics, health gauges | -| `/api/v1/monitoring/cron-summary` | GET | Cron job statuses, schedules, run history | -| `/api/v1/monitoring/sessions-summary` | GET | Active sessions with model, context %, tokens | -| `/api/v1/monitoring/sub-agents-summary` | GET | Sub-agent runs with cost, duration, status | -| `/api/v1/monitoring/trends` | GET | Cost trends, model breakdown (7d/30d) | +``` +OpenClaw Gateway(s) + │ + │ RPC (usage.cost, cron.list, sessions.list, health, status) + ▼ +GatewayCollectorService ← background asyncio task + │ + │ upsert into PostgreSQL + ▼ +Monitoring Models ← CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric, AlertRule, AlertEvent + │ + │ API endpoints + data_processing transforms + ▼ +REST API (/api/v1/monitoring/*) ← cost-summary, cost-breakdown, CRUD endpoints, more coming + │ + │ WebSocket (/ws/agents) + ▼ +Next.js Frontend ← boards, tasks, agents + upcoming monitoring dashboard +``` -#### Monitoring CRUD Endpoints +**Stack:** +- Backend: Python 3.12 / FastAPI / SQLModel / PostgreSQL / Redis +- Frontend: Next.js 16 / React 19 / Tailwind CSS / shadcn/ui / TanStack React Query +- Auth: Clerk (production) or local token (dev) +- API client: orval-generated TypeScript hooks -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/api/v1/monitoring/cost-snapshots` | GET | Paginated cost snapshot records | -| `/api/v1/monitoring/cron-jobs` | GET | Paginated cron job status records | -| `/api/v1/monitoring/sessions` | GET | Paginated session event records | -| `/api/v1/monitoring/health` | GET | Paginated system health metrics | -| `/api/v1/monitoring/sub-agents` | GET | Paginated sub-agent run records | +**Source repos** (for reference, not imported): +- `mudrii/openclaw-dashboard` (Go) — dashboard panels and data processing logic +- `jaffer1979/openclaw-pixel-agents-dashboard` (Node/Express) — session watching and event parsing -#### WebSocket +All Go/Node logic is ported to Python. No new backend languages. + +## API Reference + +### Monitoring Endpoints + +| Endpoint | Method | Description | Status | +|----------|--------|-------------|--------| +| `/api/v1/monitoring/cost-summary` | GET | Cost overview per gateway (latest snapshot) | ✅ Live | +| `/api/v1/monitoring/cost-breakdown` | GET | Per-model cost breakdown ranked by spend | ✅ Live | +| `/api/v1/monitoring/cost-snapshots` | GET | Paginated cost snapshot records | ✅ Live | +| `/api/v1/monitoring/cron-jobs` | GET | Paginated cron job status records | ✅ Live | +| `/api/v1/monitoring/sessions` | GET | Paginated session event records | ✅ Live | +| `/api/v1/monitoring/health` | GET | Paginated system health metrics | ✅ Live | +| `/api/v1/monitoring/sub-agents` | GET | Paginated sub-agent run records | ✅ Live | +| `/api/v1/monitoring/health-summary` | GET | Gateway health overview | 🔜 Pending | +| `/api/v1/monitoring/cron-summary` | GET | Cron jobs overview | 🔜 Pending | +| `/api/v1/monitoring/sessions-summary` | GET | Active sessions overview | 🔜 Pending | +| `/api/v1/monitoring/sub-agents-summary` | GET | Sub-agent activity overview | 🔜 Pending | +| `/api/v1/monitoring/trends` | GET | Cost trend charts (7d/30d) | 🔜 Pending | + +All endpoints support `?gateway_id=` filtering and are org-scoped via `require_org_member`. + +### WebSocket | Endpoint | Description | |----------|-------------| -| `/ws/agents` | Real-time agent events (initial snapshot + polling) | +| `/ws/agents` | Real-time agent events (initial 50-event snapshot + 2s polling) | -#### Gateway RPC Integration +### Platform Endpoints (97+) -The collector service polls these OpenClaw gateway RPC endpoints: -- `usage.cost` + `usage.status` → Cost snapshots -- `cron.list` → Cron job status -- `sessions.list` + `sessions.preview` → Session events -- `health` + `status` → System health metrics +The base platform provides full CRUD for: boards, tasks, agents, gateways, organizations, approvals, board groups, board memory, board webhooks, tags, skills marketplace, and more. See the OpenAPI docs at `/docs` when running. ## Quick Start -### Docker Compose (Recommended) - ```bash -git clone https://forgejo/null/Mission-Control.git +git clone ssh://forgejo/null/Mission-Control.git cd Mission-Control -cp .env.example .env +cp .env.example .env # edit LOCAL_AUTH_TOKEN and other vars docker compose up -d ``` -The backend runs on port 8080, frontend on port 3037. +Backend runs on port 8080, frontend on 3037, PostgreSQL on 5432, Redis on 6379. ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| -| `COLLECTION_INTERVAL_COST` | 300 | Seconds between cost collection | -| `COLLECTION_INTERVAL_CRON` | 60 | Seconds between cron collection | -| `COLLECTION_INTERVAL_SESSION` | 30 | Seconds between session collection | -| `COLLECTION_INTERVAL_HEALTH` | 60 | Seconds between health collection | -| `LOCAL_AUTH_TOKEN` | — | Token for local dev auth | -| `POSTGRES_HOST` | db | PostgreSQL host | -| `POSTGRES_PORT` | 5432 | PostgreSQL port | -| `POSTGRES_DB` | mission_control | Database name | -| `REDIS_URL` | redis://redis:6379/0 | Redis connection URL | +| `AUTH_MODE` | `local` | `local` for token auth, `clerk` for production | +| `LOCAL_AUTH_TOKEN` | — | Required when `AUTH_MODE=local` | +| `BACKEND_PORT` | `8000` | Backend API port | +| `FRONTEND_PORT` | `3037` | Frontend dev server port | +| `POSTGRES_DB` | `mission_control` | Database name | +| `POSTGRES_USER` | `postgres` | Database user | +| `POSTGRES_PASSWORD` | `postgres` | Database password | +| `POSTGRES_PORT` | `5432` | PostgreSQL port | +| `DB_AUTO_MIGRATE` | `true` | Run Alembic migrations on startup | +| `CORS_ORIGINS` | `http://localhost:3037` | Allowed CORS origins | +| `COLLECTION_INTERVAL_COST` | `300` | Seconds between cost collection | +| `COLLECTION_INTERVAL_CRON` | `60` | Seconds between cron collection | +| `COLLECTION_INTERVAL_SESSION` | `30` | Seconds between session collection | +| `COLLECTION_INTERVAL_HEALTH` | `60` | Seconds between health collection | +| `RQ_REDIS_URL` | `redis://redis:6379/0` | Redis URL for webhook worker | ## Project Structure ``` Mission-Control/ -├── src/ -│ ├── backend/ -│ │ ├── app/ -│ │ │ ├── api/ # API routes (monitoring, ws, gateways, etc.) -│ │ │ ├── models/ # SQLModel database models -│ │ │ ├── schemas/ # Pydantic request/response schemas -│ │ │ ├── services/ -│ │ │ │ └── monitoring/ -│ │ │ │ ├── gateway_collector.py # Background RPC collector -│ │ │ │ ├── data_processing.py # Dashboard data transforms -│ │ │ │ ├── event_parser.py # Session event parser -│ │ │ │ └── models.py # Pydantic RPC response models -│ │ │ └── main.py # FastAPI app + lifespan -│ │ ├── migrations/ # Alembic migrations -│ │ └── tests/ -│ └── frontend/ # Next.js app -│ └── src/ -│ ├── app/ # Next.js App Router pages -│ ├── components/ # React components -│ ├── api/ # Generated API clients -│ └── lib/ # Utilities -├── sources/ # Reference repos (Go, Node) -├── docker-compose.yml -├── Dockerfile -└── PROJECT.md # Full 4-phase implementation plan +├── compose.yml # Docker Compose (db, redis, backend, frontend, webhook-worker) +├── .env # Environment config +├── PROJECT.md # 4-phase implementation plan +├── STRUCTURE.md # Agent roles and project structure +├── FUTURE.md # Prioritized backlog +├── VERSION.md # Version history +├── DEVELOPMENT_LOG.md # Agent work tracking +├── HISTORY.md # Changelog +├── sources/ # Reference repos (Go, Node) — not imported +│ ├── dashboard-tracking/ # mudrii/openclaw-dashboard (Go) +│ └── pixel-agents/ # jaffer1979/openclaw-pixel-agents-dashboard (Node) +└── src/ + ├── backend/ + │ ├── app/ + │ │ ├── api/ # API routes (monitoring, boards, agents, gateways, etc.) + │ │ ├── core/ # Config, auth, logging, rate limiting, security + │ │ ├── db/ # Session, pagination, query manager + │ │ ├── models/ # SQLModel database models (30+ tables) + │ │ ├── schemas/ # Pydantic request/response schemas + │ │ ├── services/ + │ │ │ ├── monitoring/ # Collector, data processing, event parser, RPC models + │ │ │ │ ├── gateway_collector.py # Background RPC poller + │ │ │ │ ├── data_processing.py # ModelName, BuildDailyChart, BuildCostBreakdown, etc. + │ │ │ │ ├── event_parser.py # Session event parser + │ │ │ │ └── models.py # Pydantic RPC response models + │ │ │ └── openclaw/ # Gateway RPC, provisioning, lifecycle, coordination + │ │ └── main.py # FastAPI app + lifespan (collector start/stop) + │ ├── migrations/ # Alembic migrations + │ ├── scripts/ # Seed, export, sync scripts + │ └── tests/ # pytest test suite + └── frontend/ + └── src/ + ├── app/ # Next.js App Router pages + ├── components/ # React components (atoms/molecules/organisms/templates) + ├── api/ # orval-generated TypeScript API client + ├── auth/ # Clerk + local auth + ├── hooks/ # Custom React hooks + ├── lib/ # Utilities + └── proxy.ts # Dev proxy config ``` -## Data Collection Flow +## Git Workflow -``` -OpenClaw Gateway - │ - │ RPC (usage.cost, cron.list, sessions.list, health, status) - ▼ -GatewayCollectorService (background asyncio task) - │ - │ Upsert into PostgreSQL - ▼ -Monitoring Models (CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric) - │ - │ API endpoints + data_processing transforms - ▼ -Dashboard Frontend (Next.js) - │ - │ WebSocket for real-time events - ▼ -Live Agent Activity Panel -``` - -## Source Repos - -Mission Control ports functionality from two OpenClaw dashboard projects: - -- **[openclaw-dashboard](https://github.com/mudrii/openclaw-dashboard)** (Go) — Dashboard panels, data processing logic, alert computation -- **[openclaw-pixel-agents-dashboard](https://github.com/jaffer1979/openclaw-pixel-agents-dashboard)** (Node/Express) — Pixel agent visualization, session watching, event parsing - -**Key decision:** No new backend languages. Go and Node functionality ports to Python/FastAPI within Mission Control's backend. We reuse the gateway RPC transport and data model shapes, but port all processing/aggregation logic as pure Python functions. - -## Development +- **`main`** — stable/release branch +- **`dev`** — working branch (all development happens here) ```bash -# Backend -cd src/backend -pip install -r requirements.txt -uvicorn app.main:app --reload --port 8080 - -# Frontend -cd src/frontend -npm install -npm run dev +git checkout dev +# ... make changes ... +git add -A && git commit -m "type: description" +git push origin dev ``` ## License