# KMS Deployment Guide ## Table of Contents 1. [Deployment Overview](#deployment-overview) 2. [Container Architecture](#container-architecture) 3. [Environment Configuration](#environment-configuration) 4. [Docker Compose Setup](#docker-compose-setup) 5. [Production Deployment](#production-deployment) 6. [Monitoring and Health Checks](#monitoring-and-health-checks) 7. [Security Configuration](#security-configuration) 8. [Backup and Recovery](#backup-and-recovery) 9. [Troubleshooting](#troubleshooting) --- ## Deployment Overview The KMS is designed as a containerized application using Docker and Docker Compose for orchestration. The system consists of multiple services working together to provide secure API key management capabilities. ### Service Architecture - **kms-nginx**: Reverse proxy and load balancer - **kms-api**: Go backend API service - **kms-frontend**: React TypeScript frontend - **kms-postgres**: PostgreSQL database - **kms-redis**: Redis cache (optional) ### Deployment Modes - **Development**: Docker Compose with hot reload - **Testing**: Isolated container environment - **Staging**: Production-like with monitoring - **Production**: High availability with load balancing --- ## Container Architecture ```mermaid graph TB subgraph "External Network" Internet[Internet] CDN[Content Delivery Network
Static Assets] DNS[DNS
Load Balancing] end subgraph "Load Balancer Tier" LB1[Load Balancer 1
HAProxy/Nginx] LB2[Load Balancer 2
HAProxy/Nginx] VIP[Virtual IP
Failover] end subgraph "Container Orchestration - Docker Compose" subgraph "Reverse Proxy" Nginx1[Nginx Container
kms-nginx
Port 8081:80] end subgraph "API Tier" API1[API Service Container
kms-api-service
Port 8080:8080] Metrics1[Metrics Endpoint
Port 9090:9090
Prometheus] end subgraph "Frontend Tier" Frontend1[React SPA Container
kms-frontend
Port 3000:80] StaticAssets[Static Assets
Nginx served] end subgraph "Network" InternalNet[kms-network
Bridge Network
Internal Communication] end end subgraph "Data Tier" subgraph "Primary Database" PostgreSQL1[PostgreSQL 15
kms-postgres
Port 5432:5432] DBData[Persistent Volume
postgres_data] end subgraph "Cache Layer" Redis1[Redis Cache
Optional
Session Store] end subgraph "Migration System" Migrations[Database Migrations
Auto-run on startup
Volume mounted] end end subgraph "Monitoring & Observability" subgraph "Metrics Collection" Prometheus[Prometheus
Metrics Scraping] Grafana[Grafana
Dashboards] end subgraph "Logging" LogAggregator[Log Aggregation
ELK Stack / Loki] StructuredLogs[Structured Logging
JSON format] end subgraph "Health Monitoring" HealthCheck[Health Checks
/health endpoint] Alerting[Alerting
PagerDuty/Slack] end end subgraph "Security & Compliance" subgraph "Certificate Management" TLS[TLS Certificates
Let's Encrypt/Manual] CertRotation[Certificate Rotation
Automated] end subgraph "Secrets Management" EnvVars[Environment Variables
Container secrets] VaultIntegration[Vault Integration
Secret rotation] end subgraph "Backup & Recovery" DBBackup[Database Backups
Automated daily] DisasterRecovery[Disaster Recovery
Multi-region] end end subgraph "Development Environment" LocalDev[Local Development
docker-compose.yml] TestEnv[Test Environment
Isolated containers] CICDPipeline[CI/CD Pipeline
GitHub Actions] end %% External Connections Internet --> DNS DNS --> VIP CDN --> Frontend1 %% Load Balancer Configuration VIP --> LB1 VIP --> LB2 LB1 --> Nginx1 LB2 --> Nginx1 %% Container Communication Nginx1 --> API1 Nginx1 --> Frontend1 API1 --> PostgreSQL1 API1 --> Redis1 API1 --> Metrics1 %% Data Flow PostgreSQL1 --> DBData API1 --> Migrations Migrations --> PostgreSQL1 %% Network Isolation InternalNet --> API1 InternalNet --> Frontend1 InternalNet --> PostgreSQL1 InternalNet --> Redis1 InternalNet --> Nginx1 %% Monitoring Connections API1 --> Prometheus Prometheus --> Grafana API1 --> LogAggregator LogAggregator --> StructuredLogs HealthCheck --> API1 HealthCheck --> PostgreSQL1 HealthCheck --> Alerting %% Security Connections TLS --> Nginx1 CertRotation --> TLS EnvVars --> API1 VaultIntegration --> EnvVars DBBackup --> PostgreSQL1 DisasterRecovery --> DBBackup %% Development Flow CICDPipeline --> LocalDev LocalDev --> TestEnv TestEnv --> API1 %% Styling classDef external fill:#ffecb3,stroke:#f57c00,stroke-width:2px classDef loadbalancer fill:#e1bee7,stroke:#7b1fa2,stroke-width:2px classDef container fill:#c8e6c9,stroke:#388e3c,stroke-width:2px classDef data fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef monitoring fill:#fff3e0,stroke:#ef6c00,stroke-width:2px classDef security fill:#ffcdd2,stroke:#d32f2f,stroke-width:2px classDef development fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px class Internet,CDN,DNS external class LB1,LB2,VIP loadbalancer class Nginx1,API1,Frontend1,Metrics1,InternalNet,StaticAssets container class PostgreSQL1,Redis1,DBData,Migrations data class Prometheus,Grafana,LogAggregator,StructuredLogs,HealthCheck,Alerting monitoring class TLS,CertRotation,EnvVars,VaultIntegration,DBBackup,DisasterRecovery security class LocalDev,TestEnv,CICDPipeline development ``` --- ## Environment Configuration ### Environment Variables #### **Database Configuration** ```bash # PostgreSQL Database DB_HOST=postgres DB_PORT=5432 DB_NAME=kms DB_USER=postgres DB_PASSWORD=secure_password_here DB_MAX_OPEN_CONNECTIONS=25 DB_MAX_IDLE_CONNECTIONS=5 DB_CONNECTION_MAX_LIFETIME=300s DB_CONNECTION_MAX_IDLE_TIME=60s ``` #### **Server Configuration** ```bash # API Server SERVER_HOST=0.0.0.0 SERVER_PORT=8080 SERVER_READ_TIMEOUT=30s SERVER_WRITE_TIMEOUT=30s SERVER_IDLE_TIMEOUT=120s # Environment ENVIRONMENT=production LOG_LEVEL=info DEBUG_MODE=false ``` #### **Authentication Configuration** ```bash # Authentication Provider AUTH_PROVIDER=header AUTH_HEADER_USER_EMAIL=X-User-Email AUTH_SIGNING_KEY=your_hmac_signing_key_here # JWT Configuration JWT_SECRET=your_jwt_secret_here JWT_ISSUER=kms-api-service JWT_EXPIRY=1h # OAuth2 Configuration (optional) OAUTH2_CLIENT_ID=your_oauth2_client_id OAUTH2_CLIENT_SECRET=your_oauth2_client_secret OAUTH2_REDIRECT_URL=https://your-domain.com/api/oauth2/callback OAUTH2_PROVIDER_URL=https://oauth-provider.com # SAML Configuration (optional) SAML_IDP_URL=https://saml-provider.com/sso SAML_CERT_PATH=/etc/ssl/certs/saml.crt SAML_KEY_PATH=/etc/ssl/private/saml.key ``` #### **Security Configuration** ```bash # Rate Limiting RATE_LIMIT_ENABLED=true RATE_LIMIT_RPS=100 RATE_LIMIT_BURST=200 AUTH_RATE_LIMIT_RPS=5 AUTH_RATE_LIMIT_BURST=10 # Security Settings CSRF_TOKEN_MAX_AGE=1h MAX_AUTH_FAILURES=5 IP_BLOCK_DURATION=1h AUTH_FAILURE_WINDOW=15m REQUEST_MAX_AGE=5m # CORS Settings CORS_ALLOWED_ORIGINS=https://your-frontend-domain.com,https://admin.your-domain.com CORS_ALLOWED_METHODS=GET,POST,PUT,DELETE,OPTIONS CORS_ALLOWED_HEADERS=Origin,Content-Type,Accept,Authorization,X-User-Email,X-CSRF-Token ``` #### **Monitoring Configuration** ```bash # Metrics METRICS_ENABLED=true METRICS_PORT=9090 METRICS_PATH=/metrics # Health Checks HEALTH_CHECK_ENABLED=true HEALTH_CHECK_INTERVAL=30s ``` ### Configuration Management #### **Environment Files** ```bash # Development environment .env.development # Testing environment .env.test # Staging environment .env.staging # Production environment .env.production ``` #### **Docker Environment** ```yaml # docker-compose.override.yml for local development version: '3.8' services: kms-api: environment: - LOG_LEVEL=debug - DEBUG_MODE=true volumes: - ./:/app command: air -c .air.toml # Hot reload ``` --- ## Docker Compose Setup ### Development Configuration #### **docker-compose.yml** ```yaml version: '3.8' services: # PostgreSQL Database kms-postgres: image: postgres:15-alpine container_name: kms-postgres environment: POSTGRES_DB: ${DB_NAME:-kms} POSTGRES_USER: ${DB_USER:-postgres} POSTGRES_PASSWORD: ${DB_PASSWORD:-postgres} POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C" volumes: - postgres_data:/var/lib/postgresql/data - ./migrations:/docker-entrypoint-initdb.d:ro ports: - "5432:5432" healthcheck: test: ["CMD-SHELL", "pg_isready -U ${DB_USER:-postgres}"] interval: 30s timeout: 10s retries: 5 start_period: 30s networks: - kms-network restart: unless-stopped # Redis Cache (Optional) kms-redis: image: redis:7-alpine container_name: kms-redis command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD:-redis_password} volumes: - redis_data:/data ports: - "6379:6379" healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 30s timeout: 10s retries: 3 networks: - kms-network restart: unless-stopped # API Service kms-api: build: context: . dockerfile: Dockerfile target: production container_name: kms-api environment: - DB_HOST=kms-postgres - DB_PORT=5432 - DB_NAME=${DB_NAME:-kms} - DB_USER=${DB_USER:-postgres} - DB_PASSWORD=${DB_PASSWORD:-postgres} - REDIS_HOST=kms-redis - REDIS_PORT=6379 - REDIS_PASSWORD=${REDIS_PASSWORD:-redis_password} - SERVER_PORT=8080 - JWT_SECRET=${JWT_SECRET} - AUTH_SIGNING_KEY=${AUTH_SIGNING_KEY} - LOG_LEVEL=${LOG_LEVEL:-info} - RATE_LIMIT_ENABLED=true ports: - "8080:8080" - "9090:9090" # Metrics port depends_on: kms-postgres: condition: service_healthy kms-redis: condition: service_healthy healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s networks: - kms-network restart: unless-stopped volumes: - ./logs:/app/logs # Frontend Service kms-frontend: build: context: ./kms-frontend dockerfile: Dockerfile target: production container_name: kms-frontend environment: - REACT_APP_API_BASE_URL=http://localhost:8081/api ports: - "3000:80" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:80"] interval: 30s timeout: 10s retries: 3 networks: - kms-network restart: unless-stopped # Nginx Reverse Proxy kms-nginx: image: nginx:alpine container_name: kms-nginx ports: - "8081:80" - "8443:443" volumes: - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro - ./ssl:/etc/ssl/certs:ro depends_on: - kms-api - kms-frontend healthcheck: test: ["CMD", "nginx", "-t"] interval: 30s timeout: 10s retries: 3 networks: - kms-network restart: unless-stopped networks: kms-network: driver: bridge ipam: config: - subnet: 172.20.0.0/16 volumes: postgres_data: driver: local redis_data: driver: local ``` #### **Dockerfile (Multi-stage)** ```dockerfile # Build stage FROM golang:1.21-alpine AS builder WORKDIR /app # Install dependencies RUN apk add --no-cache git ca-certificates tzdata # Copy go mod files COPY go.mod go.sum ./ RUN go mod download # Copy source code COPY . . # Build the binary RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd/server # Production stage FROM alpine:latest AS production RUN apk --no-cache add ca-certificates curl WORKDIR /root/ # Copy the binary from builder stage COPY --from=builder /app/main . COPY --from=builder /app/migrations ./migrations # Create non-root user RUN addgroup -g 1001 appgroup && \ adduser -D -u 1001 -G appgroup appuser USER appuser EXPOSE 8080 9090 HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:8080/health || exit 1 CMD ["./main"] ``` ### Production Configuration #### **docker-compose.prod.yml** ```yaml version: '3.8' services: kms-postgres: image: postgres:15-alpine environment: POSTGRES_DB: ${DB_NAME} POSTGRES_USER: ${DB_USER} POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_MAX_CONNECTIONS: 100 volumes: - postgres_data:/var/lib/postgresql/data - ./postgres/postgresql.conf:/etc/postgresql/postgresql.conf - ./postgres/pg_hba.conf:/etc/postgresql/pg_hba.conf command: postgres -c config_file=/etc/postgresql/postgresql.conf deploy: resources: limits: memory: 1G cpus: '0.5' reservations: memory: 512M cpus: '0.25' restart: always logging: driver: "json-file" options: max-size: "10m" max-file: "3" kms-api: image: kms-api:latest environment: - DB_HOST=kms-postgres - DB_NAME=${DB_NAME} - DB_USER=${DB_USER} - DB_PASSWORD=${DB_PASSWORD} - JWT_SECRET=${JWT_SECRET} - AUTH_SIGNING_KEY=${AUTH_SIGNING_KEY} - LOG_LEVEL=info - ENVIRONMENT=production deploy: replicas: 3 resources: limits: memory: 512M cpus: '0.5' reservations: memory: 256M cpus: '0.25' restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 logging: driver: "json-file" options: max-size: "50m" max-file: "5" kms-nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx/nginx.prod.conf:/etc/nginx/nginx.conf:ro - ./ssl:/etc/ssl/certs:ro deploy: resources: limits: memory: 256M cpus: '0.25' restart: always ``` --- ## Production Deployment ### Load Balancer Configuration #### **HAProxy Configuration** ```bash # /etc/haproxy/haproxy.cfg global daemon maxconn 4096 log stdout local0 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms option httplog frontend kms_frontend bind *:80 bind *:443 ssl crt /etc/ssl/certs/kms.pem redirect scheme https if !{ ssl_fc } # Rate limiting stick-table type ip size 100k expire 30s store http_req_rate(10s) http-request track-sc0 src http-request reject if { sc_http_req_rate(0) gt 20 } default_backend kms_api_servers backend kms_api_servers balance roundrobin option httpchk GET /health server api1 kms-api-1:8080 check server api2 kms-api-2:8080 check server api3 kms-api-3:8080 check ``` #### **Nginx Load Balancer** ```nginx # /etc/nginx/nginx.conf upstream kms_api_backend { least_conn; server kms-api-1:8080 weight=3 max_fails=3 fail_timeout=30s; server kms-api-2:8080 weight=3 max_fails=3 fail_timeout=30s; server kms-api-3:8080 weight=3 max_fails=3 fail_timeout=30s; } upstream kms_frontend_backend { server kms-frontend-1:80; server kms-frontend-2:80; } server { listen 80; server_name kms.yourdomain.com; return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name kms.yourdomain.com; # SSL Configuration ssl_certificate /etc/ssl/certs/kms.crt; ssl_certificate_key /etc/ssl/private/kms.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384; ssl_prefer_server_ciphers off; # Security Headers add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"; add_header X-Frame-Options DENY; add_header X-Content-Type-Options nosniff; add_header X-XSS-Protection "1; mode=block"; add_header Referrer-Policy "strict-origin-when-cross-origin"; # Rate Limiting limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/s; # API Routes location /api/ { limit_req zone=api burst=20 nodelay; proxy_pass http://kms_api_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # Authentication Routes (stricter rate limiting) location /api/login { limit_req zone=auth burst=10 nodelay; proxy_pass http://kms_api_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # Frontend Routes location / { proxy_pass http://kms_frontend_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # SPA routing support try_files $uri $uri/ /index.html; } # Health Check location /health { access_log off; proxy_pass http://kms_api_backend; } # Metrics (restricted access) location /metrics { allow 10.0.0.0/8; allow 172.16.0.0/12; allow 192.168.0.0/16; deny all; proxy_pass http://kms_api_backend:9090; } } ``` ### Database High Availability #### **PostgreSQL Primary-Replica Setup** ```yaml # docker-compose.ha.yml version: '3.8' services: postgres-primary: image: postgres:15-alpine environment: POSTGRES_DB: kms POSTGRES_USER: postgres POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_REPLICATION_USER: replicator POSTGRES_REPLICATION_PASSWORD: ${REPLICATION_PASSWORD} volumes: - postgres_primary_data:/var/lib/postgresql/data - ./postgres/primary.conf:/etc/postgresql/postgresql.conf - ./postgres/setup-replication.sh:/docker-entrypoint-initdb.d/setup-replication.sh command: postgres -c config_file=/etc/postgresql/postgresql.conf ports: - "5432:5432" postgres-replica: image: postgres:15-alpine environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_PRIMARY_HOST: postgres-primary POSTGRES_REPLICATION_USER: replicator POSTGRES_REPLICATION_PASSWORD: ${REPLICATION_PASSWORD} volumes: - postgres_replica_data:/var/lib/postgresql/data - ./postgres/replica.conf:/etc/postgresql/postgresql.conf - ./postgres/setup-replica.sh:/docker-entrypoint-initdb.d/setup-replica.sh command: postgres -c config_file=/etc/postgresql/postgresql.conf ports: - "5433:5432" depends_on: - postgres-primary volumes: postgres_primary_data: postgres_replica_data: ``` --- ## Monitoring and Health Checks ### Health Check Configuration #### **Application Health Checks** ```go // File: internal/handlers/health.go type HealthResponse struct { Status string `json:"status"` Timestamp time.Time `json:"timestamp"` Services map[string]string `json:"services"` Version string `json:"version"` } func (h *HealthHandler) GetHealth(c *gin.Context) { response := &HealthResponse{ Status: "healthy", Timestamp: time.Now(), Services: make(map[string]string), Version: h.version, } // Check database connectivity if err := h.db.Ping(); err != nil { response.Status = "unhealthy" response.Services["database"] = "unhealthy" } else { response.Services["database"] = "healthy" } // Check Redis connectivity if err := h.redis.Ping(); err != nil { response.Services["cache"] = "unhealthy" } else { response.Services["cache"] = "healthy" } statusCode := http.StatusOK if response.Status == "unhealthy" { statusCode = http.StatusServiceUnavailable } c.JSON(statusCode, response) } ``` #### **Docker Health Checks** ```dockerfile # API Service Health Check HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:8080/health || exit 1 # Database Health Check HEALTHCHECK --interval=30s --timeout=10s --retries=5 \ CMD pg_isready -U postgres || exit 1 # Frontend Health Check HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:80 || exit 1 ``` ### Prometheus Metrics #### **Metrics Configuration** ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'kms-api' static_configs: - targets: ['kms-api:9090'] metrics_path: /metrics scrape_interval: 15s - job_name: 'nginx' static_configs: - targets: ['kms-nginx:9113'] - job_name: 'postgres' static_configs: - targets: ['postgres-exporter:9187'] - job_name: 'redis' static_configs: - targets: ['redis-exporter:9121'] ``` #### **Grafana Dashboards** ```json { "dashboard": { "title": "KMS System Metrics", "panels": [ { "title": "Request Rate", "type": "graph", "targets": [ { "expr": "rate(http_requests_total[5m])", "legendFormat": "{{method}} {{status}}" } ] }, { "title": "Response Time", "type": "graph", "targets": [ { "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))", "legendFormat": "95th percentile" } ] }, { "title": "Database Connections", "type": "graph", "targets": [ { "expr": "postgres_stat_database_numbackends", "legendFormat": "Active connections" } ] }, { "title": "Authentication Success Rate", "type": "stat", "targets": [ { "expr": "rate(auth_attempts_success_total[5m]) / rate(auth_attempts_total[5m])", "legendFormat": "Success rate" } ] } ] } } ``` --- ## Security Configuration ### SSL/TLS Configuration #### **Certificate Management** ```bash # Let's Encrypt certificate generation docker run --rm -it \ -v /etc/letsencrypt:/etc/letsencrypt \ -v /var/lib/letsencrypt:/var/lib/letsencrypt \ certbot/certbot certonly \ --standalone \ --email admin@yourdomain.com \ --agree-tos \ --no-eff-email \ -d kms.yourdomain.com # Certificate renewal cron job 0 2 * * * docker run --rm -it \ -v /etc/letsencrypt:/etc/letsencrypt \ -v /var/lib/letsencrypt:/var/lib/letsencrypt \ certbot/certbot renew --quiet ``` #### **SSL Configuration** ```nginx # Strong SSL configuration ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-CHACHA20-POLY1305; ssl_prefer_server_ciphers off; ssl_session_cache shared:SSL:10m; ssl_session_timeout 1d; ssl_session_tickets off; ssl_stapling on; ssl_stapling_verify on; # HSTS add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"; ``` ### Secrets Management #### **Docker Secrets** ```yaml version: '3.8' secrets: db_password: file: ./secrets/db_password.txt jwt_secret: file: ./secrets/jwt_secret.txt auth_signing_key: file: ./secrets/auth_signing_key.txt services: kms-api: image: kms-api:latest secrets: - db_password - jwt_secret - auth_signing_key environment: - DB_PASSWORD_FILE=/run/secrets/db_password - JWT_SECRET_FILE=/run/secrets/jwt_secret - AUTH_SIGNING_KEY_FILE=/run/secrets/auth_signing_key ``` #### **HashiCorp Vault Integration** ```go // Vault secret retrieval func getSecretFromVault(path string) (string, error) { client, err := vault.NewClient(&vault.Config{ Address: os.Getenv("VAULT_ADDR"), }) if err != nil { return "", err } client.SetToken(os.Getenv("VAULT_TOKEN")) secret, err := client.Logical().Read(path) if err != nil { return "", err } if secret == nil || secret.Data == nil { return "", fmt.Errorf("secret not found") } value, ok := secret.Data["value"].(string) if !ok { return "", fmt.Errorf("invalid secret format") } return value, nil } ``` --- ## Backup and Recovery ### Database Backup Strategy #### **Automated Backup Script** ```bash #!/bin/bash # backup.sh set -e # Configuration BACKUP_DIR="/backups" POSTGRES_CONTAINER="kms-postgres" S3_BUCKET="kms-backups" RETENTION_DAYS=30 # Create backup directory mkdir -p $BACKUP_DIR # Generate backup filename with timestamp BACKUP_NAME="kms_backup_$(date +%Y%m%d_%H%M%S).sql" BACKUP_PATH="$BACKUP_DIR/$BACKUP_NAME" # Create database dump echo "Creating database backup..." docker exec $POSTGRES_CONTAINER pg_dump -U postgres kms > $BACKUP_PATH # Compress backup gzip $BACKUP_PATH # Upload to S3 echo "Uploading backup to S3..." aws s3 cp "$BACKUP_PATH.gz" "s3://$S3_BUCKET/$(date +%Y/%m/%d)/$BACKUP_NAME.gz" # Clean up local files older than retention period find $BACKUP_DIR -name "*.gz" -mtime +$RETENTION_DAYS -delete # Clean up S3 files older than retention period aws s3 ls "s3://$S3_BUCKET/" --recursive | while read -r line; do createDate=$(echo $line | awk '{print $1" "$2}') createDate=$(date -d "$createDate" +%s) olderThan=$(date -d "$RETENTION_DAYS days ago" +%s) if [[ $createDate -lt $olderThan ]]; then fileName=$(echo $line | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//') aws s3 rm "s3://$S3_BUCKET/$fileName" fi done echo "Backup completed successfully" ``` #### **Backup Cron Job** ```bash # Daily backup at 2 AM 0 2 * * * /opt/kms/scripts/backup.sh >> /var/log/kms-backup.log 2>&1 # Weekly full backup at 1 AM on Sundays 0 1 * * 0 /opt/kms/scripts/full-backup.sh >> /var/log/kms-backup.log 2>&1 ``` ### Disaster Recovery #### **Recovery Procedure** ```bash #!/bin/bash # restore.sh set -e BACKUP_FILE=$1 POSTGRES_CONTAINER="kms-postgres" if [ -z "$BACKUP_FILE" ]; then echo "Usage: $0 " exit 1 fi # Download backup from S3 if needed if [[ $BACKUP_FILE == s3://* ]]; then LOCAL_FILE="/tmp/$(basename $BACKUP_FILE)" aws s3 cp "$BACKUP_FILE" "$LOCAL_FILE" BACKUP_FILE=$LOCAL_FILE fi # Extract if compressed if [[ $BACKUP_FILE == *.gz ]]; then gunzip -c "$BACKUP_FILE" > "/tmp/backup.sql" BACKUP_FILE="/tmp/backup.sql" fi # Stop API services echo "Stopping API services..." docker-compose stop kms-api kms-frontend # Drop and recreate database echo "Recreating database..." docker exec $POSTGRES_CONTAINER psql -U postgres -c "DROP DATABASE IF EXISTS kms;" docker exec $POSTGRES_CONTAINER psql -U postgres -c "CREATE DATABASE kms;" # Restore database echo "Restoring database..." docker exec -i $POSTGRES_CONTAINER psql -U postgres kms < "$BACKUP_FILE" # Start services echo "Starting services..." docker-compose up -d # Verify restoration echo "Verifying restoration..." sleep 30 curl -f http://localhost:8081/health || { echo "Health check failed after restoration" exit 1 } echo "Database restoration completed successfully" ``` --- ## Troubleshooting ### Common Issues #### **Database Connection Issues** ```bash # Check database container logs docker logs kms-postgres # Test database connectivity docker exec kms-postgres pg_isready -U postgres # Check connection pool status curl http://localhost:8080/health | jq '.services.database' ``` #### **Authentication Problems** ```bash # Check JWT secret configuration docker exec kms-api env | grep JWT_SECRET # Verify HMAC key configuration docker exec kms-api env | grep AUTH_SIGNING_KEY # Test authentication endpoint curl -X POST http://localhost:8081/api/login \ -H "Content-Type: application/json" \ -d '{"app_id": "test-app", "user_id": "test@example.com"}' ``` #### **Performance Issues** ```bash # Check API response times curl -w "@curl-format.txt" -s -o /dev/null http://localhost:8081/api/health # Monitor database connections docker exec kms-postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;" # Check memory usage docker stats kms-api ``` #### **SSL/TLS Issues** ```bash # Test SSL certificate openssl s_client -connect kms.yourdomain.com:443 -servername kms.yourdomain.com # Check certificate expiration curl -vI https://kms.yourdomain.com 2>&1 | grep -i expire # Verify certificate chain curl -I https://kms.yourdomain.com ``` ### Debugging Commands #### **Container Debugging** ```bash # View container logs docker logs -f kms-api # Execute shell in container docker exec -it kms-api /bin/sh # Inspect container configuration docker inspect kms-api # Check resource usage docker stats ``` #### **Network Debugging** ```bash # Test inter-container connectivity docker exec kms-api ping kms-postgres # Check port binding netstat -tlnp | grep :8080 # Inspect Docker network docker network inspect kms_kms-network ``` This deployment guide provides comprehensive instructions for deploying, configuring, and maintaining the KMS in various environments, from development to production-scale deployments.