Skip to content

Monitoring

Effective monitoring ensures Rack Gateway remains available and helps identify issues before they impact users.

Rack Gateway exposes health endpoints for monitoring and orchestration:

Terminal window
curl https://gateway.example.com/api/v1/health

Response:

{
"status": "ok"
}

Use for:

  • Container/pod liveness probes
  • Basic uptime monitoring
  • Load balancer health checks

Configure health checks in your Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: rack-gateway
spec:
template:
spec:
containers:
- name: gateway
livenessProbe:
httpGet:
path: /api/v1/health
port: 8443
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/v1/health
port: 8443
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3

Configure in convox.yml:

services:
gateway:
health:
path: /api/v1/health
interval: 15
timeout: 5

Rack Gateway integrates with Sentry for error tracking and performance monitoring.

Terminal window
# Backend error tracking
SENTRY_DSN=https://abc123@sentry.io/123456
# Frontend error tracking
SENTRY_JS_DSN=https://def456@sentry.io/789012
# Environment tag
SENTRY_ENVIRONMENT=production
Event TypeDetails
PanicsUnhandled errors with stack traces
HTTP Errors5xx responses with request context
Database ErrorsConnection failures, query timeouts
External FailuresOAuth, Convox API errors

Sentry automatically filters sensitive fields. Additional scrubbing is configured for:

  • Session tokens
  • API tokens
  • OAuth tokens
  • Environment variable values

Rack Gateway emits two kinds of logs:

  • Application logs use standard text output (Go log format)
  • Audit logs are structured JSON written to stdout for CloudWatch ingestion

Example application log:

2025/01/29 12:34:56 WebAuthn enabled: rpid=gateway.example.com origin=https://gateway.example.com

Example audit log (JSON):

{
"ts": "2024-01-15T10:30:00Z",
"user_email": "alice@example.com",
"action_type": "convox",
"action": "deploy.create",
"resource": "myapp",
"resource_type": "app",
"status": "success",
"rbac_decision": "allow",
"http_status": 200,
"latency_ms": 1250,
"ip_address": "192.168.1.100",
"user_agent": "rack-gateway-cli/1.0.0",
"event_count": 1
}
LevelWhen Used
errorFailures requiring attention
warnPotential issues, degraded functionality
infoNormal operations, request logging
debugDetailed diagnostic information

Configure with:

Terminal window
LOG_LEVEL=info # Options: debug, info, warn, error

For AWS deployments, logs are automatically available in CloudWatch:

Terminal window
# View logs in CloudWatch
aws logs tail /ecs/rack-gateway --follow

Monitor these metrics for Rack Gateway health:

MetricDescriptionAlert Threshold
Request latencyp50, p95, p99 response timesp99 > 2s
Error rate5xx responses / total requests> 1%
Request volumeRequests per secondBaseline deviation
MetricDescriptionAlert Threshold
CPU usageContainer CPU utilization> 80% sustained
Memory usageContainer memory utilization> 80%
Database connectionsActive connection countNear max_connections
MetricDescriptionAlert Threshold
Active sessionsConcurrent authenticated usersCapacity planning
API token usageRequests per tokenAnomaly detection
Failed authenticationsOAuth/MFA failuresSpike detection
  • Health endpoint returns non-200
  • Error rate exceeds 5%
  • Database connection failures
  • Authentication system unavailable
  • Error rate exceeds 1%
  • Latency p99 exceeds SLA
  • CPU/memory approaching limits
  • Unusual access patterns detected
groups:
- name: rack-gateway
rules:
- alert: GatewayHighErrorRate
expr: |
sum(rate(http_requests_total{job="rack-gateway",status=~"5.."}[5m]))
/ sum(rate(http_requests_total{job="rack-gateway"}[5m])) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate on Rack Gateway"
- alert: GatewayDown
expr: up{job="rack-gateway"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Rack Gateway is down"

Key panels for a monitoring dashboard:

CategoryMetricValue
TrafficRequest Rate100 req/s
Error Rate0.5%
Latency (p99)150ms
UsersActive Users50
API Token Usage200 req/min
MFA Events10
ResourcesDatabase Connections15/50
CPU Usage45%
Memory60%
  1. Set up health check monitoring first - Basic availability is critical
  2. Configure error tracking early - Catch issues before users report them
  3. Create runbooks for alerts - Document response procedures
  4. Review logs regularly - Don’t just alert, understand patterns
  5. Test alerting - Verify alerts reach the right people