GoServe Architecture
This document describes the system architecture, data flow, and component interactions of GoServe.
Table of Contents
- System Overview
- Component Architecture
- Request Flow
- Data Flow Diagrams
- Threading Model
- Memory Management
System Overview
┌───────────────────────────────────────────────────────────────────┐
│ GoServe System │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Client │───▶│ HTTP Server │───▶│ Handlers │ │
│ │ (curl/app) │ │ (Port 8080) │ │ (Routing) │ │
│ └─────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Model Registry │ │
│ │ (Thread-Safe) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ ONNX Session │ │
│ │ (Go Wrapper) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ ONNX Runtime │ │
│ │ (C Library) │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────┘
Component Architecture
1. HTTP Server Layer
┌─────────────────────────────────────────────────────────┐
│ HTTP Server │
│ (internal/server/) │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Middleware │ │ Router │ │
│ │ - Logging │───────▶│ (Go 1.22+) │ │
│ │ - Request ID│ │ │ │
│ │ - Recovery │ └──────┬───────┘ │
│ └──────────────┘ │ │
│ │ │
│ ┌────────────────────────┴──────────────┐ │
│ │ │ │ │
│ ┌────▼─────┐ ┌───────▼──────┐ ┌────▼───┐ │
│ │ Health │ │ Model │ │ Infer │ │
│ │ Handlers │ │ Handlers │ │Handler │ │
│ └──────────┘ └──────────────┘ └────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Components:
- Router: Uses Go 1.22+ standard library ServeMux with method matching.
- Middleware: Structured logging, request tracing, and panic recovery.
- Handlers: Business logic for health checks, model management, and inference.
2. Model Registry
The Model Registry maintains the lifecycle of all loaded models in memory.
Thread Safety:
- Uses sync.RWMutex to allow concurrent inference while preventing conflicts during model loading/unloading.
- Optimized for read-heavy workloads (inference).
Model Struct:
type Model struct {
Name string // Model identifier
Path string // File path to .onnx file
Format string // "onnx"
Session *onnx.Session // ONNX Runtime session
InputInfo []onnx.TensorInfo // Input metadata
OutputInfo []onnx.TensorInfo // Output metadata
LoadedAt time.Time // Load timestamp
}
3. ONNX Session Wrapper
GoServe interacts with the ONNX Runtime C library via CGO bindings.
Tensor Flow:
Go [][]float32 ──▶ Flatten ──▶ []float32
│
Create Tensor (Shape: [batch, features])
│
Pass to ONNX Runtime (CGO)
│
ONNX Runtime C API (Inference Execution)
│
Extract Output Tensor
│
Reshape ──▶ [][]float32 ──▶ Return to Go
Request Flow
Inference Request Flow
- Client sends a
POST /v1/models/{model}/inferrequest with JSON data. - HTTP Server routes the request and applies logging/tracing middleware.
- Inference Handler parses the JSON and validates the input shape.
- Model Registry retrieves the model session.
- ONNX Wrapper flattens the input, creates C tensors, and calls the ONNX Runtime C API.
- Results are extracted from output tensors, reshaped, and returned to the handler.
- Handler builds the final JSON response with predictions and probabilities.
Threading Model
GoServe leverages Go's native concurrency for high throughput.
- HTTP Server: Each request is handled in its own lightweight Goroutine.
- Registry Access: Thread-safe access via
RWMutexensures multiple concurrent inferences can proceed without blocking. - ONNX Runtime: The underlying C library is thread-safe and executes inference in optimized background threads managed by the runtime.
Memory Management
Go vs C Memory
- Go Memory (GC Managed): Handles HTTP request/response structs, JSON marshaling, and the Model Registry map.
- C Memory (Manual): ONNX tensors and session state are allocated in C memory to avoid Go GC overhead during inference. GoServe explicitly destroys these objects after use to prevent memory leaks.
Security Considerations
- Input Validation: Strict checking of feature counts and data types before passing to C code.
- Resource Limits: Batch size limits and memory management to prevent denial-of-service.
- Path Traversal Protection: Validates model paths to ensure only authorized files are loaded.
- CGO Safety: Carefully managed boundaries between Go and C to prevent memory corruption.
For more details, see: - Full Technical Guide - Quick Start README - API Reference