Files
skybridge/kms/docs/ARCHITECTURE.md
2025-08-26 19:16:41 -04:00

677 lines
22 KiB
Markdown

# API Key Management Service (KMS) - System Architecture
## Table of Contents
1. [System Overview](#system-overview)
2. [Architecture Principles](#architecture-principles)
3. [Component Architecture](#component-architecture)
4. [System Architecture Diagram](#system-architecture-diagram)
5. [Request Flow Pipeline](#request-flow-pipeline)
6. [Authentication Flow](#authentication-flow)
7. [API Design](#api-design)
8. [Technology Stack](#technology-stack)
---
## System Overview
The API Key Management Service (KMS) is a secure, scalable platform for managing API authentication tokens across applications. Built with Go backend and React TypeScript frontend, it provides centralized token lifecycle management with enterprise-grade security features.
### Key Capabilities
- **Multi-Provider Authentication**: Header, JWT, OAuth2, SAML support
- **Dual Token System**: Static HMAC tokens and renewable JWT user tokens
- **Hierarchical Permissions**: Role-based access control with inheritance
- **Enterprise Security**: Rate limiting, brute force protection, audit logging
- **High Availability**: Containerized deployment with load balancing
### Core Features
- **Token Lifecycle Management**: Create, verify, renew, and revoke tokens
- **Application Management**: Multi-tenant application configuration
- **User Session Tracking**: Comprehensive session management
- **Audit Logging**: Complete audit trail of all operations
- **Health Monitoring**: Built-in health checks and metrics
---
## Architecture Principles
### Clean Architecture
The system follows clean architecture principles with clear separation of concerns:
```
┌─────────────────┐
│ Handlers │ ← HTTP request handling
├─────────────────┤
│ Services │ ← Business logic
├─────────────────┤
│ Repositories │ ← Data access
├─────────────────┤
│ Database │ ← Data persistence
└─────────────────┘
```
### Design Principles
- **Dependency Injection**: All services receive dependencies through constructors
- **Interface Segregation**: Repository interfaces enable testing and flexibility
- **Single Responsibility**: Each component has one clear purpose
- **Fail-Safe Defaults**: Security-first configuration with safe fallbacks
- **Immutable Operations**: Database transactions and audit logging
- **Defense in Depth**: Multiple security layers throughout the stack
---
## Component Architecture
### Backend Components (`internal/`)
#### **Handlers Layer** (`internal/handlers/`)
HTTP request processors implementing REST API endpoints:
- **`application.go`**: Application CRUD operations
- Create, read, update, delete applications
- HMAC key management
- Ownership validation
- **`auth.go`**: Authentication workflows
- User login and logout
- Token renewal and validation
- Multi-provider authentication
- **`token.go`**: Token operations
- Static token creation
- Token verification
- Token revocation
- **`health.go`**: System health checks
- Database connectivity
- Cache availability
- Service status
- **`oauth2.go`, `saml.go`**: External authentication providers
- OAuth2 authorization code flow
- SAML assertion validation
- Provider callback handling
#### **Services Layer** (`internal/services/`)
Business logic implementation with transaction management:
- **`auth_service.go`**: Authentication provider orchestration
- Multi-provider authentication
- Session management
- User context creation
- **`token_service.go`**: Token lifecycle management
- Static token generation and validation
- JWT user token management
- Permission assignment
- **`application_service.go`**: Application configuration management
- Application CRUD operations
- Configuration validation
- HMAC key rotation
- **`session_service.go`**: User session tracking
- Session creation and validation
- Session timeout handling
- Cross-provider session management
#### **Repository Layer** (`internal/repository/postgres/`)
Data access with ACID transaction support:
- **`application_repository.go`**: Application persistence
- Secure dynamic query building
- Parameterized queries
- Ownership validation
- **`token_repository.go`**: Static token management
- BCrypt token hashing
- Token lookup and validation
- Permission relationship management
- **`permission_repository.go`**: Permission catalog
- Hierarchical permission structure
- Permission validation
- Bulk permission operations
- **`session_repository.go`**: User session storage
- Session persistence
- Expiration management
- Provider metadata storage
#### **Authentication Providers** (`internal/auth/`)
Pluggable authentication system:
- **`header_validator.go`**: HMAC signature validation
- Timestamp-based replay protection
- Constant-time signature comparison
- Email format validation
- **`jwt.go`**: JWT token management
- Token generation with secure JTI
- Signature validation
- Revocation list management
- **`oauth2.go`**: OAuth2 authorization code flow
- State management
- Token exchange
- Provider integration
- **`saml.go`**: SAML assertion validation
- XML signature validation
- Attribute extraction
- Provider configuration
- **`permissions.go`**: Hierarchical permission evaluation
- Role-based access control
- Permission inheritance
- Bulk permission evaluation
---
## System Architecture Diagram
```mermaid
graph TB
%% External Components
Client[Client Applications]
Browser[Web Browser]
AuthProvider[OAuth2/SAML Provider]
%% Load Balancer & Proxy
subgraph "Load Balancer Layer"
Nginx[Nginx Proxy<br/>:80, :8081]
end
%% Frontend Layer
subgraph "Frontend Layer"
React[React TypeScript SPA<br/>Ant Design UI<br/>:3000]
AuthContext[Authentication Context]
APIService[API Service Client]
end
%% API Gateway & Middleware
subgraph "API Layer"
API[Go API Server<br/>:8080]
subgraph "Middleware Chain"
Logger[Request Logger]
Security[Security Headers<br/>CORS, CSRF]
RateLimit[Rate Limiter<br/>100 RPS]
Auth[Authentication<br/>Header/JWT/OAuth2/SAML]
Validation[Request Validator]
end
end
%% Business Logic Layer
subgraph "Service Layer"
AuthService[Authentication Service]
TokenService[Token Service]
AppService[Application Service]
SessionService[Session Service]
PermService[Permission Service]
end
%% Data Access Layer
subgraph "Repository Layer"
AppRepo[Application Repository]
TokenRepo[Token Repository]
PermRepo[Permission Repository]
SessionRepo[Session Repository]
end
%% Infrastructure Layer
subgraph "Infrastructure"
PostgreSQL[(PostgreSQL 15<br/>:5432)]
Redis[(Redis Cache<br/>Optional)]
Metrics[Prometheus Metrics<br/>:9090]
end
%% External Security
subgraph "Security & Crypto"
HMAC[HMAC Signature<br/>Validation]
BCrypt[BCrypt Hashing<br/>Cost 14]
JWT[JWT Token<br/>Generation]
end
%% Flow Connections
Client -->|API Requests| Nginx
Browser -->|HTTPS| Nginx
AuthProvider -->|OAuth2/SAML| API
Nginx -->|Proxy| React
Nginx -->|API Proxy| API
React --> AuthContext
React --> APIService
APIService -->|REST API| API
API --> Logger
Logger --> Security
Security --> RateLimit
RateLimit --> Auth
Auth --> Validation
Validation --> AuthService
Validation --> TokenService
Validation --> AppService
Validation --> SessionService
Validation --> PermService
AuthService --> AppRepo
TokenService --> TokenRepo
AppService --> AppRepo
SessionService --> SessionRepo
PermService --> PermRepo
AppRepo --> PostgreSQL
TokenRepo --> PostgreSQL
PermRepo --> PostgreSQL
SessionRepo --> PostgreSQL
TokenService --> HMAC
TokenService --> BCrypt
TokenService --> JWT
AuthService --> Redis
API --> Metrics
%% Styling
classDef frontend fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef api fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef service fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef data fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
classDef security fill:#ffebee,stroke:#c62828,stroke-width:2px
class React,AuthContext,APIService frontend
class API,Logger,Security,RateLimit,Auth,Validation api
class AuthService,TokenService,AppService,SessionService,PermService service
class PostgreSQL,Redis,Metrics,AppRepo,TokenRepo,PermRepo,SessionRepo data
class HMAC,BCrypt,JWT security
```
---
## Request Flow Pipeline
```mermaid
flowchart TD
Start([HTTP Request]) --> Nginx{Nginx Proxy<br/>Load Balancer}
Nginx -->|Static Assets| Frontend[React SPA<br/>Port 3000]
Nginx -->|API Routes| API[Go API Server<br/>Port 8080]
API --> Logger[Request Logger<br/>Structured Logging]
Logger --> Security[Security Middleware<br/>Headers, CORS, CSRF]
Security --> RateLimit{Rate Limiter<br/>100 RPS, 200 Burst}
RateLimit -->|Exceeded| RateResponse[429 Too Many Requests]
RateLimit -->|Within Limits| Auth[Authentication<br/>Middleware]
Auth --> AuthHeader{Auth Provider}
AuthHeader -->|header| HeaderAuth[Header Validator<br/>X-User-Email]
AuthHeader -->|jwt| JWTAuth[JWT Validator<br/>Signature + Claims]
AuthHeader -->|oauth2| OAuth2Auth[OAuth2 Flow<br/>Authorization Code]
AuthHeader -->|saml| SAMLAuth[SAML Assertion<br/>XML Validation]
HeaderAuth --> AuthCache{Check Cache<br/>Redis 5min TTL}
JWTAuth --> JWTValidation[Signature Validation<br/>Expiry Check]
OAuth2Auth --> OAuth2Exchange[Token Exchange<br/>User Info Retrieval]
SAMLAuth --> SAMLValidation[Assertion Validation<br/>Signature Check]
AuthCache -->|Hit| AuthContext[Create AuthContext]
AuthCache -->|Miss| DBAuth[Database Lookup<br/>User Permissions]
JWTValidation --> RevocationCheck[Check Revocation List<br/>Redis Cache]
OAuth2Exchange --> SessionStore[Store User Session<br/>PostgreSQL]
SAMLValidation --> SessionStore
DBAuth --> CacheStore[Store in Cache<br/>5min TTL]
RevocationCheck --> AuthContext
SessionStore --> AuthContext
CacheStore --> AuthContext
AuthContext --> Validation[Request Validator<br/>JSON Schema]
Validation -->|Invalid| ValidationError[400 Bad Request]
Validation -->|Valid| Router{Route Handler}
Router -->|/health| HealthHandler[Health Check<br/>DB + Cache Status]
Router -->|/api/applications| AppHandler[Application CRUD<br/>HMAC Key Management]
Router -->|/api/tokens| TokenHandler[Token Operations<br/>Create, Verify, Revoke]
Router -->|/api/login| AuthHandler[Authentication<br/>Login, Renewal]
Router -->|/api/oauth2| OAuth2Handler[OAuth2 Callbacks<br/>State Management]
Router -->|/api/saml| SAMLHandler[SAML Callbacks<br/>Assertion Processing]
HealthHandler --> Service[Service Layer]
AppHandler --> Service
TokenHandler --> Service
AuthHandler --> Service
OAuth2Handler --> Service
SAMLHandler --> Service
Service --> Repository[Repository Layer<br/>Database Operations]
Repository --> PostgreSQL[(PostgreSQL<br/>ACID Transactions)]
Service --> CryptoOps[Cryptographic Operations]
CryptoOps --> HMAC[HMAC Signature<br/>Timestamp Validation]
CryptoOps --> BCrypt[BCrypt Hashing<br/>Cost 14]
CryptoOps --> JWT[JWT Generation<br/>RS256 Signing]
Repository --> AuditLog[Audit Logging<br/>All Operations]
AuditLog --> AuditTable[(audit_logs table)]
Service --> Response[HTTP Response]
Response --> Metrics[Prometheus Metrics<br/>Port 9090]
Response --> End([Response Sent])
%% Error Paths
RateResponse --> End
ValidationError --> End
%% Styling
classDef middleware fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef auth fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef handler fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef data fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
classDef crypto fill:#ffebee,stroke:#c62828,stroke-width:2px
classDef error fill:#fce4ec,stroke:#ad1457,stroke-width:2px
class Logger,Security,RateLimit,Validation middleware
class Auth,HeaderAuth,JWTAuth,OAuth2Auth,SAMLAuth,AuthContext auth
class HealthHandler,AppHandler,TokenHandler,AuthHandler,OAuth2Handler,SAMLHandler,Service handler
class Repository,PostgreSQL,AuditTable,AuthCache data
class CryptoOps,HMAC,BCrypt,JWT crypto
class RateResponse,ValidationError error
```
### Request Processing Pipeline
1. **Load Balancer**: Nginx receives and routes requests
2. **Static Assets**: React SPA served directly by Nginx
3. **API Gateway**: Go server handles API requests
4. **Middleware Chain**: Security, rate limiting, authentication
5. **Route Handler**: Business logic processing
6. **Service Layer**: Transaction management and orchestration
7. **Repository Layer**: Database operations with audit logging
8. **Response**: JSON response with metrics collection
---
## Authentication Flow
```mermaid
sequenceDiagram
participant Client as Client App
participant API as API Gateway
participant Auth as Auth Service
participant DB as PostgreSQL
participant Provider as OAuth2/SAML
participant Cache as Redis Cache
%% Header-based Authentication
rect rgb(240, 248, 255)
Note over Client, Cache: Header-based Authentication Flow
Client->>API: Request with X-User-Email header
API->>Auth: Validate header auth
Auth->>DB: Check user permissions
DB-->>Auth: Return user context
Auth->>Cache: Cache auth result (5min TTL)
Auth-->>API: AuthContext{UserID, Permissions}
API-->>Client: Authenticated response
end
%% JWT Authentication Flow
rect rgb(245, 255, 245)
Note over Client, Cache: JWT Authentication Flow
Client->>API: Login request {app_id, permissions}
API->>Auth: Generate JWT token
Auth->>DB: Validate app_id and permissions
DB-->>Auth: Application config
Auth->>Auth: Create JWT with claims<br/>{user_id, permissions, exp, iat}
Auth-->>API: JWT token + expires_at
API-->>Client: LoginResponse{token, expires_at}
Note over Client, API: Subsequent requests with JWT
Client->>API: Request with Bearer JWT
API->>Auth: Verify JWT signature
Auth->>Auth: Check expiration & claims
Auth->>Cache: Check revocation list
Cache-->>Auth: Token status
Auth-->>API: Valid AuthContext
API-->>Client: Authorized response
end
%% OAuth2/SAML Flow
rect rgb(255, 248, 240)
Note over Client, Provider: OAuth2/SAML Authentication Flow
Client->>API: POST /api/login {app_id, redirect_uri}
API->>Auth: Generate OAuth2 state
Auth->>DB: Store state + app context
Auth-->>API: Redirect URL + state
API-->>Client: {redirect_url, state}
Client->>Provider: Redirect to OAuth2 provider
Provider-->>Client: Authorization code + state
Client->>API: GET /api/oauth2/callback?code=xxx&state=yyy
API->>Auth: Validate state and exchange code
Auth->>Provider: Exchange code for tokens
Provider-->>Auth: Access token + ID token
Auth->>Provider: Get user info
Provider-->>Auth: User profile
Auth->>DB: Create/update user session
Auth->>Auth: Generate internal JWT
Auth-->>API: JWT token + user context
API-->>Client: Set-Cookie with JWT + redirect
end
%% Token Renewal Flow
rect rgb(248, 245, 255)
Note over Client, DB: Token Renewal Flow
Client->>API: POST /api/renew {app_id, user_id, token}
API->>Auth: Validate current token
Auth->>Auth: Check token expiration<br/>and max_valid_at
Auth->>DB: Get application config
DB-->>Auth: TokenRenewalDuration, MaxTokenDuration
Auth->>Auth: Generate new JWT<br/>with extended expiry
Auth->>Cache: Invalidate old token
Auth-->>API: New token + expires_at
API-->>Client: RenewResponse{token, expires_at}
end
```
### Authentication Methods
#### **Header-based Authentication**
- **Use Case**: Service-to-service authentication
- **Security**: HMAC-SHA256 signatures with timestamp validation
- **Replay Protection**: 5-minute timestamp window
- **Caching**: 5-minute Redis cache for performance
#### **JWT Authentication**
- **Use Case**: User authentication with session management
- **Security**: RSA signatures with revocation checking
- **Token Lifecycle**: Configurable expiration with renewal
- **Claims**: User ID, permissions, application scope
#### **OAuth2/SAML Authentication**
- **Use Case**: External identity provider integration
- **Security**: Authorization code flow with state validation
- **Session Management**: Database-backed session storage
- **Provider Support**: Configurable provider endpoints
---
## API Design
### RESTful Endpoints
#### **Authentication Endpoints**
```
POST /api/login - Authenticate user, issue JWT
POST /api/renew - Renew JWT token
POST /api/logout - Revoke JWT token
GET /api/verify - Verify token and permissions
```
#### **Application Management**
```
GET /api/applications - List applications (paginated)
POST /api/applications - Create application
GET /api/applications/:id - Get application details
PUT /api/applications/:id - Update application
DELETE /api/applications/:id - Delete application
```
#### **Token Management**
```
GET /api/applications/:id/tokens - List application tokens
POST /api/applications/:id/tokens - Create new token
DELETE /api/tokens/:id - Revoke token
```
#### **OAuth2/SAML Integration**
```
POST /api/oauth2/login - Initiate OAuth2 flow
GET /api/oauth2/callback - OAuth2 callback handler
POST /api/saml/login - Initiate SAML flow
POST /api/saml/callback - SAML assertion handler
```
#### **System Endpoints**
```
GET /health - System health check
GET /ready - Readiness probe
GET /metrics - Prometheus metrics
```
### Request/Response Patterns
#### **Authentication Headers**
```http
X-User-Email: user@example.com
X-Auth-Timestamp: 2024-01-15T10:30:00Z
X-Auth-Signature: sha256=abc123...
Authorization: Bearer eyJhbGciOiJSUzI1NiIs...
Content-Type: application/json
```
#### **Error Response Format**
```json
{
"error": "validation_failed",
"message": "Request validation failed",
"details": [
{
"field": "permissions",
"message": "Invalid permission format",
"value": "invalid.perm"
}
]
}
```
#### **Success Response Format**
```json
{
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"token": "ABC123_xyz789...",
"permissions": ["app.read", "token.create"],
"created_at": "2024-01-15T10:30:00Z"
}
}
```
---
## Technology Stack
### Backend Technologies
- **Language**: Go 1.21+ with modules
- **Web Framework**: Gin HTTP framework
- **Database**: PostgreSQL 15 with connection pooling
- **Authentication**: JWT-Go library with RSA signing
- **Cryptography**: Go standard crypto libraries
- **Caching**: Redis for session and revocation storage
- **Logging**: Zap structured logging
- **Metrics**: Prometheus metrics collection
### Frontend Technologies
- **Framework**: React 18 with TypeScript
- **UI Library**: Ant Design components
- **State Management**: React Context API
- **HTTP Client**: Axios with interceptors
- **Routing**: React Router with protected routes
- **Build Tool**: Create React App with TypeScript
### Infrastructure
- **Containerization**: Docker with multi-stage builds
- **Orchestration**: Docker Compose for local development
- **Reverse Proxy**: Nginx with load balancing
- **Database Migrations**: Custom Go migration system
- **Health Monitoring**: Built-in health check endpoints
### Security Stack
- **TLS**: TLS 1.3 for all communications
- **Hashing**: BCrypt with cost 14 for production
- **Signatures**: HMAC-SHA256 and RSA signatures
- **Rate Limiting**: Token bucket algorithm
- **CSRF**: Double-submit cookie pattern
- **Headers**: Comprehensive security headers
---
## Deployment Considerations
### Container Configuration
```yaml
services:
kms-api:
image: kms-api:latest
ports:
- "8080:8080"
environment:
- DB_HOST=postgres
- DB_PORT=5432
- JWT_SECRET=${JWT_SECRET}
- HMAC_KEY=${HMAC_KEY}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
```
### Environment Variables
```bash
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=kms
DB_USER=postgres
DB_PASSWORD=postgres
# Server Configuration
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
# Authentication
AUTH_PROVIDER=header
JWT_SECRET=your-jwt-secret
HMAC_KEY=your-hmac-key
# Security
RATE_LIMIT_ENABLED=true
RATE_LIMIT_RPS=100
RATE_LIMIT_BURST=200
```
### Monitoring Setup
```yaml
prometheus:
scrape_configs:
- job_name: 'kms-api'
static_configs:
- targets: ['kms-api:9090']
scrape_interval: 15s
metrics_path: /metrics
```
This architecture documentation provides a comprehensive technical overview of the KMS system, suitable for development teams, system architects, and operations personnel who need to understand, deploy, or maintain the system.