This commit is contained in:
2025-08-26 14:36:08 -04:00
parent 7fa0c5dbfc
commit 7ca61eb712

View File

@ -0,0 +1,879 @@
# KMS System Implementation Guide
This document provides detailed implementation guidance for the KMS system, covering areas not extensively documented in other files. It serves as a comprehensive reference for developers working on system components.
## Table of Contents
1. [Documentation Consistency Analysis](#documentation-consistency-analysis)
2. [Audit System Implementation](#audit-system-implementation)
3. [Multi-Tenancy Support](#multi-tenancy-support)
4. [Cache Implementation Details](#cache-implementation-details)
5. [Error Handling Framework](#error-handling-framework)
6. [Validation System](#validation-system)
7. [Metrics and Monitoring](#metrics-and-monitoring)
8. [Database Migration System](#database-migration-system)
9. [Frontend Architecture](#frontend-architecture)
10. [Configuration Management](#configuration-management)
---
## Documentation Consistency Analysis
### Current State Assessment
The existing documentation is comprehensive but has some minor inconsistencies with the actual codebase:
#### ✅ Accurate Documentation Areas:
- **API endpoints** match the implementation in handlers
- **Database schema** aligns with migrations (especially the new audit_events table)
- **Authentication flows** are correctly documented
- **Docker compose setup** matches actual configuration
- **Security architecture** accurately reflects implementation
- **Permission system** documentation is consistent with code
#### ⚠️ Minor Inconsistencies Found:
1. **Port references**: Some docs mention port 80 but actual nginx runs on 8081
2. **Container names**: Documentation uses generic names, actual compose uses specific names like `kms-postgres`
3. **Rate limiting values**: Docs show different values than actual middleware implementation
4. **Frontend build process**: React version mentioned as 18, but package.json shows 19+
#### ✨ Recently Added Features (Not in Original Docs):
- **Audit system** with comprehensive event logging
- **Multi-tenancy support** in database schema
- **Advanced caching layer** with Redis integration
- **SAML authentication** implementation
- **Advanced security middleware** with brute force protection
---
## Audit System Implementation
### Overview
The KMS implements a comprehensive audit logging system that tracks all system events, user actions, and security-related activities.
### Core Components
#### Audit Event Structure
```go
// File: internal/audit/audit.go
type AuditEvent struct {
ID uuid.UUID `json:"id"`
Type EventType `json:"type"`
Severity Severity `json:"severity"`
Status Status `json:"status"`
Timestamp time.Time `json:"timestamp"`
// Actor information
ActorID string `json:"actor_id"`
ActorType ActorType `json:"actor_type"`
ActorIP string `json:"actor_ip"`
UserAgent string `json:"user_agent"`
// Multi-tenancy support
TenantID *uuid.UUID `json:"tenant_id,omitempty"`
// Resource information
ResourceID string `json:"resource_id"`
ResourceType string `json:"resource_type"`
// Event details
Action string `json:"action"`
Description string `json:"description"`
Details map[string]interface{} `json:"details"`
// Request context
RequestID string `json:"request_id"`
SessionID string `json:"session_id"`
// Metadata
Tags []string `json:"tags"`
Metadata map[string]interface{} `json:"metadata"`
}
```
#### Event Types Taxonomy
```
auth.* - Authentication events
├── auth.login - Successful user login
├── auth.login_failed - Failed login attempt
├── auth.logout - User logout
├── auth.token_created - Token generation
├── auth.token_revoked - Token revocation
└── auth.token_validated - Token validation
session.* - Session management
├── session.created - New session created
├── session.revoked - Session terminated
└── session.expired - Session timeout
app.* - Application management
├── app.created - Application created
├── app.updated - Application modified
└── app.deleted - Application removed
permission.* - Permission operations
├── permission.granted - Permission assigned
├── permission.revoked - Permission removed
└── permission.denied - Access denied
tenant.* - Multi-tenant operations
├── tenant.created - New tenant
├── tenant.updated - Tenant modified
├── tenant.suspended - Tenant suspended
└── tenant.activated - Tenant reactivated
```
#### Database Schema
```sql
-- File: migrations/004_add_audit_events.up.sql
CREATE TABLE audit_events (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
type VARCHAR(50) NOT NULL,
severity VARCHAR(20) NOT NULL CHECK (severity IN ('info', 'warning', 'error', 'critical')),
status VARCHAR(20) NOT NULL CHECK (status IN ('success', 'failure', 'pending')),
timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
-- Actor information
actor_id VARCHAR(255),
actor_type VARCHAR(50) CHECK (actor_type IN ('user', 'system', 'service')),
actor_ip INET,
user_agent TEXT,
-- Multi-tenancy
tenant_id UUID,
-- Resource tracking
resource_id VARCHAR(255),
resource_type VARCHAR(100),
action VARCHAR(100) NOT NULL,
description TEXT NOT NULL,
details JSONB DEFAULT '{}',
-- Request context
request_id VARCHAR(100),
session_id VARCHAR(255),
-- Metadata
tags TEXT[],
metadata JSONB DEFAULT '{}'
);
```
#### Frontend Integration
```typescript
// File: kms-frontend/src/components/Audit.tsx
interface AuditEvent {
id: string;
type: string;
severity: 'info' | 'warning' | 'error' | 'critical';
status: 'success' | 'failure' | 'pending';
timestamp: string;
actor_id: string;
actor_type: string;
resource_type: string;
action: string;
description: string;
}
const Audit: React.FC = () => {
// Real-time audit log viewing with filtering
// Timeline view for event sequences
// Statistics dashboard for audit metrics
};
```
### Implementation Guidelines
#### Logging Best Practices
1. **Log all security-relevant events**
2. **Include sufficient context** for forensic analysis
3. **Use structured logging** with consistent fields
4. **Implement log retention policies**
5. **Ensure tamper-evident logging**
#### Performance Considerations
1. **Asynchronous logging** to avoid blocking operations
2. **Batch inserts** for high-volume events
3. **Proper indexing** on commonly queried fields
4. **Archival strategy** for historical data
---
## Multi-Tenancy Support
### Architecture
The KMS implements a multi-tenant architecture where each tenant has isolated data and permissions while sharing the same application instance.
### Database Design
#### Tenant Model
```go
// File: internal/domain/tenant.go
type Tenant struct {
ID uuid.UUID `json:"id" db:"id"`
Name string `json:"name" db:"name"`
Slug string `json:"slug" db:"slug"`
Status TenantStatus `json:"status" db:"status"`
Settings TenantSettings `json:"settings" db:"settings"`
Metadata map[string]interface{} `json:"metadata" db:"metadata"`
CreatedAt time.Time `json:"created_at" db:"created_at"`
UpdatedAt time.Time `json:"updated_at" db:"updated_at"`
}
type TenantStatus string
const (
TenantStatusActive TenantStatus = "active"
TenantStatusSuspended TenantStatus = "suspended"
TenantStatusPending TenantStatus = "pending"
)
```
#### Data Isolation Strategy
```sql
-- All tenant-specific tables include tenant_id
ALTER TABLE applications ADD COLUMN tenant_id UUID REFERENCES tenants(id);
ALTER TABLE static_tokens ADD COLUMN tenant_id UUID REFERENCES tenants(id);
ALTER TABLE user_sessions ADD COLUMN tenant_id UUID REFERENCES tenants(id);
ALTER TABLE audit_events ADD COLUMN tenant_id UUID;
-- Row-level security policies
ALTER TABLE applications ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON applications
FOR ALL TO kms_user
USING (tenant_id = current_setting('app.current_tenant')::UUID);
```
### Implementation Pattern
#### Tenant Context Middleware
```go
// File: internal/middleware/tenant.go
func TenantMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
tenantID := extractTenantID(c)
if tenantID == "" {
c.AbortWithStatusJSON(400, gin.H{"error": "tenant_required"})
return
}
// Set tenant context
c.Set("tenant_id", tenantID)
// Set database session variable
db := c.MustGet("db").(*sql.DB)
_, err := db.Exec("SELECT set_config('app.current_tenant', $1, true)", tenantID)
if err != nil {
c.AbortWithStatusJSON(500, gin.H{"error": "tenant_setup_failed"})
return
}
c.Next()
}
}
```
### Usage Guidelines
1. **Always include tenant_id** in database queries
2. **Validate tenant access** in middleware
3. **Implement tenant-aware caching**
4. **Audit cross-tenant operations**
5. **Test tenant isolation thoroughly**
---
## Cache Implementation Details
### Architecture
The KMS implements a layered caching system with multiple providers and configurable TTL policies.
### Cache Interface
```go
// File: internal/cache/cache.go
type CacheManager interface {
Get(ctx context.Context, key string) ([]byte, error)
Set(ctx context.Context, key string, value []byte, ttl time.Duration) error
GetJSON(ctx context.Context, key string, dest interface{}) error
SetJSON(ctx context.Context, key string, value interface{}, ttl time.Duration) error
Delete(ctx context.Context, key string) error
Clear(ctx context.Context) error
Exists(ctx context.Context, key string) (bool, error)
}
```
### Redis Implementation
```go
// File: internal/cache/redis.go
type RedisCacheManager struct {
client redis.Client
keyPrefix string
serializer JSONSerializer
logger *zap.Logger
}
func (r *RedisCacheManager) GetJSON(ctx context.Context, key string, dest interface{}) error {
prefixedKey := r.keyPrefix + key
data, err := r.client.Get(ctx, prefixedKey).Bytes()
if err != nil {
if err == redis.Nil {
return ErrCacheMiss
}
return fmt.Errorf("failed to get key %s: %w", prefixedKey, err)
}
return r.serializer.Deserialize(data, dest)
}
```
### Cache Key Management
```go
type CacheKey string
const (
KeyPrefixAuth = "auth:"
KeyPrefixToken = "token:"
KeyPrefixPermission = "perm:"
KeyPrefixSession = "sess:"
KeyPrefixApp = "app:"
)
func CacheKey(prefix, suffix string) string {
return fmt.Sprintf("%s%s", prefix, suffix)
}
```
### Usage Patterns
#### Authentication Caching
```go
// Cache authentication results for 5 minutes
cacheKey := cache.CacheKey(cache.KeyPrefixAuth, fmt.Sprintf("%s:%s", userID, appID))
err := cacheManager.SetJSON(ctx, cacheKey, authResult, 5*time.Minute)
```
#### Token Revocation List
```go
// Cache revoked tokens until their expiration
revokedKey := cache.CacheKey(cache.KeyPrefixToken, "revoked:"+tokenID)
err := cacheManager.Set(ctx, revokedKey, []byte("1"), tokenExpiry.Sub(time.Now()))
```
### Configuration
```bash
# Cache configuration
CACHE_ENABLED=true
CACHE_PROVIDER=redis # or memory
REDIS_ADDR=localhost:6379
REDIS_PASSWORD=
REDIS_DB=0
CACHE_DEFAULT_TTL=5m
```
---
## Error Handling Framework
### Error Type Hierarchy
```go
// File: internal/errors/errors.go
type ErrorCode string
const (
ErrorCodeValidation ErrorCode = "validation_error"
ErrorCodeAuthentication ErrorCode = "authentication_error"
ErrorCodeAuthorization ErrorCode = "authorization_error"
ErrorCodeNotFound ErrorCode = "not_found"
ErrorCodeConflict ErrorCode = "conflict"
ErrorCodeInternal ErrorCode = "internal_error"
ErrorCodeRateLimit ErrorCode = "rate_limit_exceeded"
ErrorCodeBadRequest ErrorCode = "bad_request"
)
type APIError struct {
Code ErrorCode `json:"code"`
Message string `json:"message"`
Details interface{} `json:"details,omitempty"`
HTTPStatus int `json:"-"`
Cause error `json:"-"`
}
```
### Error Factory Functions
```go
func NewValidationError(message string, details interface{}) *APIError {
return &APIError{
Code: ErrorCodeValidation,
Message: message,
Details: details,
HTTPStatus: http.StatusBadRequest,
}
}
func NewAuthenticationError(message string) *APIError {
return &APIError{
Code: ErrorCodeAuthentication,
Message: message,
HTTPStatus: http.StatusUnauthorized,
}
}
```
### Error Handler Middleware
```go
// File: internal/errors/secure_responses.go
func (e *ErrorHandler) HandleError(c *gin.Context, err error) {
var apiErr *APIError
if errors.As(err, &apiErr) {
// Log error with context
e.logger.Error("API error",
zap.String("error_code", string(apiErr.Code)),
zap.String("message", apiErr.Message),
zap.Int("http_status", apiErr.HTTPStatus),
zap.Error(apiErr.Cause))
c.JSON(apiErr.HTTPStatus, gin.H{
"error": apiErr.Code,
"message": apiErr.Message,
"details": apiErr.Details,
})
return
}
// Handle unexpected errors
e.logger.Error("Unexpected error", zap.Error(err))
c.JSON(http.StatusInternalServerError, gin.H{
"error": ErrorCodeInternal,
"message": "An internal error occurred",
})
}
```
---
## Validation System
### Validator Implementation
```go
// File: internal/validation/validator.go
type Validator struct {
validator *validator.Validate
logger *zap.Logger
}
func NewValidator(logger *zap.Logger) *Validator {
v := validator.New()
// Register custom validators
v.RegisterValidation("app_id", validateAppID)
v.RegisterValidation("token_type", validateTokenType)
v.RegisterValidation("permission_scope", validatePermissionScope)
return &Validator{
validator: v,
logger: logger,
}
}
```
### Custom Validation Rules
```go
func validateAppID(fl validator.FieldLevel) bool {
appID := fl.Field().String()
// App ID format: domain.app (e.g., com.example.app)
pattern := `^[a-z0-9]+(\.[a-z0-9]+)*\.[a-z0-9]+$`
match, _ := regexp.MatchString(pattern, appID)
return match && len(appID) >= 3 && len(appID) <= 100
}
func validatePermissionScope(fl validator.FieldLevel) bool {
scope := fl.Field().String()
// Permission format: domain.action (e.g., app.read)
pattern := `^[a-z_]+(\.[a-z_]+)*$`
match, _ := regexp.MatchString(pattern, scope)
return match && len(scope) >= 1 && len(scope) <= 50
}
```
### Middleware Integration
```go
// File: internal/middleware/validation.go
func (v *ValidationMiddleware) ValidateJSON(schema interface{}) gin.HandlerFunc {
return gin.HandlerFunc(func(c *gin.Context) {
if err := c.ShouldBindJSON(schema); err != nil {
var validationErrors []ValidationError
if errs, ok := err.(validator.ValidationErrors); ok {
for _, e := range errs {
validationErrors = append(validationErrors, ValidationError{
Field: e.Field(),
Message: e.Tag(),
Value: e.Value(),
})
}
}
apiErr := errors.NewValidationError("Request validation failed", validationErrors)
v.errorHandler.HandleError(c, apiErr)
return
}
c.Next()
})
}
```
---
## Metrics and Monitoring
### Prometheus Integration
```go
// File: internal/metrics/metrics.go
type Metrics struct {
// HTTP metrics
httpRequestsTotal *prometheus.CounterVec
httpRequestDuration *prometheus.HistogramVec
httpRequestsInFlight prometheus.Gauge
// Auth metrics
authAttemptsTotal *prometheus.CounterVec
authSuccessTotal *prometheus.CounterVec
authFailuresTotal *prometheus.CounterVec
// Token metrics
tokensIssuedTotal *prometheus.CounterVec
tokenValidationsTotal *prometheus.CounterVec
// Business metrics
applicationsTotal prometheus.Gauge
activeSessionsTotal prometheus.Gauge
}
```
### Metrics Collection
```go
func (m *Metrics) RecordHTTPRequest(method, path string, statusCode int, duration time.Duration) {
m.httpRequestsTotal.WithLabelValues(method, path, strconv.Itoa(statusCode)).Inc()
m.httpRequestDuration.WithLabelValues(method, path).Observe(duration.Seconds())
}
func (m *Metrics) RecordAuthAttempt(provider, result string) {
m.authAttemptsTotal.WithLabelValues(provider, result).Inc()
if result == "success" {
m.authSuccessTotal.WithLabelValues(provider).Inc()
} else {
m.authFailuresTotal.WithLabelValues(provider).Inc()
}
}
```
### Dashboard Configuration
```yaml
# Grafana dashboard config
panels:
- title: "Request Rate"
type: "graph"
targets:
- expr: "rate(http_requests_total[5m])"
legendFormat: "{{method}} {{path}}"
- title: "Authentication Success Rate"
type: "stat"
targets:
- expr: "rate(auth_success_total[5m]) / rate(auth_attempts_total[5m]) * 100"
legendFormat: "Success Rate %"
- title: "Active Applications"
type: "stat"
targets:
- expr: "applications_total"
legendFormat: "Applications"
```
---
## Database Migration System
### Migration Structure
```
migrations/
├── 001_initial_schema.up.sql
├── 001_initial_schema.down.sql
├── 002_user_sessions.up.sql
├── 002_user_sessions.down.sql
├── 003_add_token_prefix.up.sql
├── 003_add_token_prefix.down.sql
├── 004_add_audit_events.up.sql
└── 004_add_audit_events.down.sql
```
### Migration Runner
```go
// File: internal/database/postgres.go
func RunMigrations(db *sql.DB, migrationPath string) error {
driver, err := postgres.WithInstance(db, &postgres.Config{})
if err != nil {
return fmt.Errorf("failed to create migration driver: %w", err)
}
m, err := migrate.NewWithDatabaseInstance(
fmt.Sprintf("file://%s", migrationPath),
"postgres", driver)
if err != nil {
return fmt.Errorf("failed to create migration instance: %w", err)
}
if err := m.Up(); err != nil && err != migrate.ErrNoChange {
return fmt.Errorf("failed to run migrations: %w", err)
}
return nil
}
```
### Migration Best Practices
1. **Always create both up and down migrations**
2. **Test migrations on copy of production data**
3. **Make migrations idempotent**
4. **Add proper indexes for performance**
5. **Include rollback procedures**
### Example Migration
```sql
-- 005_add_oauth_providers.up.sql
CREATE TABLE IF NOT EXISTS oauth_providers (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(100) NOT NULL UNIQUE,
client_id VARCHAR(255) NOT NULL,
client_secret_encrypted TEXT NOT NULL,
authorization_url TEXT NOT NULL,
token_url TEXT NOT NULL,
user_info_url TEXT NOT NULL,
scopes TEXT[] DEFAULT ARRAY['openid', 'profile', 'email'],
enabled BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_oauth_providers_name ON oauth_providers(name);
CREATE INDEX idx_oauth_providers_enabled ON oauth_providers(enabled) WHERE enabled = true;
```
---
## Frontend Architecture
### Component Structure
```
src/
├── components/
│ ├── Applications.tsx # Application management
│ ├── Tokens.tsx # Token operations
│ ├── Users.tsx # User management
│ ├── Audit.tsx # Audit log viewer
│ ├── Dashboard.tsx # Main dashboard
│ ├── Login.tsx # Authentication
│ ├── TokenTester.tsx # Token testing utility
│ └── TokenTesterCallback.tsx
├── contexts/
│ └── AuthContext.tsx # Authentication state
├── services/
│ └── apiService.ts # API client
├── App.tsx # Main application
└── index.tsx # Entry point
```
### API Service Implementation
```typescript
// File: kms-frontend/src/services/apiService.ts
class APIService {
private baseURL: string;
private token: string | null = null;
constructor(baseURL: string) {
this.baseURL = baseURL;
}
async request<T>(endpoint: string, options: RequestInit = {}): Promise<T> {
const url = `${this.baseURL}${endpoint}`;
const headers = {
'Content-Type': 'application/json',
'X-User-Email': this.getUserEmail(),
...options.headers,
};
const response = await fetch(url, {
...options,
headers,
});
if (!response.ok) {
const error = await response.json().catch(() => ({}));
throw new APIError(error.message || 'Request failed', response.status);
}
return response.json();
}
// Application management
async getApplications(): Promise<Application[]> {
return this.request<Application[]>('/api/applications');
}
// Audit log access
async getAuditEvents(params: AuditQueryParams): Promise<AuditEvent[]> {
const queryString = new URLSearchParams(params).toString();
return this.request<AuditEvent[]>(`/api/audit/events?${queryString}`);
}
}
```
### Authentication Context
```typescript
// File: kms-frontend/src/contexts/AuthContext.tsx
interface AuthContextType {
user: User | null;
login: (email: string) => Promise<void>;
logout: () => void;
isAuthenticated: boolean;
isLoading: boolean;
}
export const AuthContext = React.createContext<AuthContextType | null>(null);
export const AuthProvider: React.FC<{ children: React.ReactNode }> = ({ children }) => {
const [user, setUser] = useState<User | null>(null);
const [isLoading, setIsLoading] = useState(true);
const login = async (email: string) => {
try {
setIsLoading(true);
const response = await apiService.login(email);
setUser({ email, token: response.token });
localStorage.setItem('kms_user', JSON.stringify({ email }));
} catch (error) {
throw error;
} finally {
setIsLoading(false);
}
};
// ... rest of implementation
};
```
---
## Configuration Management
### Configuration Interface
```go
// File: internal/config/config.go
type ConfigProvider interface {
GetString(key string) string
GetInt(key string) int
GetBool(key string) bool
GetDuration(key string) time.Duration
GetStringSlice(key string) []string
IsSet(key string) bool
Validate() error
GetDatabaseDSN() string
GetServerAddress() string
IsDevelopment() bool
IsProduction() bool
}
```
### Configuration Validation
```go
func (c *Config) Validate() error {
var errors []string
// Required configuration
required := []string{
"INTERNAL_HMAC_KEY",
"JWT_SECRET",
"AUTH_SIGNING_KEY",
"DB_HOST",
"DB_NAME",
}
for _, key := range required {
if !c.IsSet(key) {
errors = append(errors, fmt.Sprintf("required configuration %s is not set", key))
}
}
// Validate key lengths
if len(c.GetString("INTERNAL_HMAC_KEY")) < 32 {
errors = append(errors, "INTERNAL_HMAC_KEY must be at least 32 characters")
}
if len(errors) > 0 {
return fmt.Errorf("configuration validation failed: %s", strings.Join(errors, ", "))
}
return nil
}
```
### Environment Configuration
```bash
# Security Configuration
INTERNAL_HMAC_KEY=3924f352b7ea63b27db02bf4b0014f2961a5d2f7c27643853a4581bb3a5457cb
JWT_SECRET=7f5e11d55e957988b00ce002418680af384219ef98c50d08cbbbdd541978450c
AUTH_SIGNING_KEY=484f921b39c383e6b3e0cc5a7cef3c2cec3d7c8d474ab5102891dc4c2bf63a68
# Database Configuration
DB_HOST=postgres
DB_PORT=5432
DB_NAME=kms
DB_USER=postgres
DB_PASSWORD=postgres
# Feature Flags
RATE_LIMIT_ENABLED=true
CACHE_ENABLED=false
METRICS_ENABLED=true
SAML_ENABLED=false
```
---
## Implementation Best Practices
### Code Organization
1. **Follow clean architecture principles**
2. **Use dependency injection throughout**
3. **Implement comprehensive error handling**
4. **Add structured logging to all components**
5. **Write unit tests for business logic**
### Security Guidelines
1. **Always validate input at API boundaries**
2. **Use parameterized database queries**
3. **Implement proper authentication and authorization**
4. **Log all security-relevant events**
5. **Follow principle of least privilege**
### Performance Considerations
1. **Implement caching for frequently accessed data**
2. **Use database indexes appropriately**
3. **Monitor and optimize slow queries**
4. **Implement proper connection pooling**
5. **Use asynchronous operations where beneficial**
### Testing Strategy
1. **Unit tests for business logic**
2. **Integration tests for API endpoints**
3. **End-to-end tests for critical workflows**
4. **Load testing for performance validation**
5. **Security testing for vulnerability assessment**
---
*This document serves as a comprehensive implementation guide for the KMS system. It should be updated as the system evolves and new features are added.*