Commit Graph

16 Commits

Author SHA1 Message Date
Ripley d747c1ddb0 fix: add range validation to /trends endpoint (1-365 day limit)
Security fix from Private_Hudson audit. Prevents arbitrary range queries
that could cause expensive DB operations. Invalid ranges now return 400
with clear error message instead of being silently accepted.
2026-05-10 22:43:16 -05:00
Ripley e348deb299 feat: add 5 remaining monitoring summary endpoints
- health-summary: gateway status, PID, uptime, CPU/RAM/swap/disk, compaction
- cron-summary: cron jobs with schedule, status, failures, model
- sessions-summary: active sessions with model display names, context %, tokens
- sub-agents-summary: sub-agent runs with cost, duration, status, tokens
- trends: cost/token daily trends with 7d/30d range filter

All endpoints are org-scoped, support gateway_id filtering, and use
data_processing functions (ModelName, BuildDailyChart) where appropriate.
Syntax validated with py_compile.
2026-05-10 22:41:42 -05:00
Ripley 3719ab42b4 feat: add cost-summary and cost-breakdown monitoring endpoints
- CostSummaryRead schema + GET /monitoring/cost-summary (latest snapshot per gateway)
- CostBreakdownRead schema + GET /monitoring/cost-breakdown (models ranked by cost with %)
- Both endpoints support ?gateway_id= filtering and org-scoping
- Updated FUTURE.md: dashboard logic port and WebSocket marked done, remaining 5 summary endpoints queued
2026-05-10 22:38:41 -05:00
Ripley fd7d0aca42 docs: rewrite README to reflect actual current project state 2026-05-10 22:27:45 -05:00
Ripley 504d4e4eb5 docs: add project README with features, architecture, and API reference 2026-05-10 22:19:40 -05:00
Ripley 461ccbcb88 feat: add dashboard data processing + WebSocket agent events (Phase 2 complete)
Dashboard Data Processing (data_processing.py):
- ModelName() — maps raw provider/model strings to display names
- BuildDailyChart() — aggregates cost/token/call data into daily chart buckets
- BuildAlerts() — evaluates cost, cron, context, gateway, memory alert conditions
- BuildCostBreakdown() — ranks models by cost descending
- FmtTokens() — formats token counts (1.2M, 1.5K, etc.)
- round2(), sum_bucket_costs(), TitleCase() — utility functions
- All pure Python, no I/O or RPC — transforms data from DB

WebSocket Agent Events (ws.py + event_parser.py):
- WebSocket endpoint at /ws/agents for real-time agent event broadcasting
- On connect: sends last 50 session events as initial state
- Background task polls SessionEvent table every 2s for new events
- Parses events into dashboard format (agentStatus, agentToolStart, etc.)
- broadcast_event() for other services to push events in real-time
- Lifespan integration: start/stop broadcast task with app startup/shutdown

Fixed in Ripley review:
- Removed duplicate method definitions in WebSocketConnectionManager
- Fixed broken import in main.py (dangling tuple)
- Removed inline import re in event_parser.py (already at module level)
- Fixed duplicate memory_search/memory_get entries in format_tool_status
- Used asyncio.get_running_loop() instead of deprecated get_event_loop()
- Cleaned up broadcast_to_all with proper broken connection cleanup
2026-05-10 21:01:38 -05:00
Ripley 22bc6bc36e docs: update FUTURE.md and MEMORY.md for monitoring API endpoints 2026-05-10 20:45:23 -05:00
Ripley 638bcd2d91 feat: add monitoring API endpoints (10 read-only endpoints)
- GET /monitoring/cost-snapshots + /{id} — list/detail with gateway_id, date range filters
- GET /monitoring/cron-jobs + /{id} — list/detail with gateway_id, enabled, job_name filters
- GET /monitoring/sessions + /{id} — list/detail with gateway_id, session_key, model, event_type filters
- GET /monitoring/health + /{id} — list/detail with gateway_id, date range filters
- GET /monitoring/sub-agents + /{id} — list/detail with gateway_id, status, agent filters
- All endpoints org-scoped (require_org_member), paginated (DefaultLimitOffsetPage)
- Pydantic Read schemas for all 5 monitoring models
- Router registered in main.py
- Removed unused imports (OkResponse, utcnow)
2026-05-10 20:44:44 -05:00
Ripley 85e805c388 docs: update DEVELOPMENT_LOG for v0.0.4 collector work 2026-05-10 20:16:42 -05:00
Ripley 3140dac7cd docs: update VERSION, HISTORY, FUTURE for v0.0.4 — gateway collector complete 2026-05-10 20:15:31 -05:00
Ripley d09822a821 feat: add gateway data collection service + fix model FK definitions
- Created src/backend/app/services/monitoring/ package:
  - gateway_collector.py: Background asyncio task that polls gateway RPC endpoints
    (usage.cost, usage.status, cron.list, sessions.list/preview, health, status)
    and stores results in monitoring models using upsert pattern
  - models.py: Pydantic schemas for parsing gateway RPC responses
  - __init__.py: Package init, exports GatewayCollectorService

- Added collector startup/shutdown in main.py lifespan:
  - Launches collector as background task when gateways exist
  - Clean shutdown on app termination

- Fixed model FK definitions in monitoring.py and alert_rules.py:
  - Replaced Column(UUID, ForeignKey(...)) with Field(foreign_key=...)
    to match codebase pattern (UUID is Python class, not SQLAlchemy type)
  - Added missing gateway_id field to AlertRule model
  - Removed OpenClawDBService inheritance from GatewayCollectorService
    (uses session factory pattern instead of injected session)
  - Cleaned up duplicate/conflicting imports

- Configurable collection intervals via env vars:
  COLLECTION_INTERVAL_COST (300s), COLLECTION_INTERVAL_CRON (60s),
  COLLECTION_INTERVAL_SESSION (30s), COLLECTION_INTERVAL_HEALTH (60s)
2026-05-10 20:13:16 -05:00
Ripley 81794c4a5e docs: bump to v0.0.3 — Phase 2 monitoring models 2026-05-10 19:41:22 -05:00
Ripley f4b7e992ad feat: Phase 2 monitoring models — 7 new tables with CASCADE and composite indexes
- Add monitoring.py: CostSnapshot, CronJobStatus, SessionEvent, SubAgentRun, SystemHealthMetric
- Add alert_rules.py: AlertRule, AlertEvent
- Register all 7 models in __init__.py
- Add Alembic migration 7a8b9c0d1e2f for 7 new monitoring tables
- Add Alembic migration 8f9a0b1c2d3e for CASCADE FK rules, composite indexes, acknowledged_by FK
- Update env.py for transaction-per-migration to avoid failure chaining
- Security: ondelete CASCADE on all org/gateway FKs, SET NULL on acknowledged_by
- Performance: composite indexes on (org_id, created_at) and (org_id, gateway_id) for all monitoring tables
2026-05-10 19:40:25 -05:00
Ripley d1719ab394 docs: update STRUCTURE.md with full agent structure, pipeline, dispatch protocol
- Added complete agent role definitions (Prime, Ripley, Neo, Scarlett, Bishop, Private_Hudson)
- Added development pipeline: Neo → Bishop → Private_Hudson → Ripley
- Added agent dispatch protocol with context block template
- Added Docker setup documentation (ports, auth mode, test procedure)
- Added universal mandate and tech stack compliance checklist
- Updated FUTURE.md: marked Phase 1 CRITICAL as done (was already complete)
- Added Engineering Reference Manual as LOW priority with explicit user initiation note
- Updated DEVELOPMENT_LOG.md with completed work entries
2026-05-10 18:14:26 -05:00
Ripley a32a38f082 docs: update VERSION.md and HISTORY.md for v0.0.2 - base platform running 2026-05-10 11:15:30 -05:00
Ripley 9aee2e41e8 feat: initial project setup - base platform forked and running
- Copied base platform (Python/FastAPI backend + Next.js frontend)
- Adapted Dockerfile for src/ layout, fixed scripts paths for worker
- Created compose.yml with local dev configuration
- Auth mode: local (token-based)
- Ports: backend 8080, frontend 3080 (avoiding conflicts)
- All 4 services running: db, redis, backend, frontend, webhook-worker
- 97 API endpoints verified operational
- Database migrations auto-applied
- Git repo initialized on main branch
2026-05-10 11:14:55 -05:00