Skip to content

GoServe Architecture

This document describes the system architecture, data flow, and component interactions of GoServe.


Table of Contents

  1. System Overview
  2. Component Architecture
  3. Request Flow
  4. Data Flow Diagrams
  5. Threading Model
  6. Memory Management

System Overview

┌───────────────────────────────────────────────────────────────────┐
│                          GoServe System                            │
│                                                                   │
│  ┌─────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Client    │───▶│  HTTP Server │───▶│  Handlers    │          │
│  │  (curl/app) │    │  (Port 8080) │    │  (Routing)   │          │
│  └─────────────┘    └──────────────┘    └──────┬───────┘          │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  Model Registry │       │
│                                         │  (Thread-Safe)  │       │
│                                         └────────┬────────┘       │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  ONNX Session   │       │
│                                         │  (Go Wrapper)   │       │
│                                         └────────┬────────┘       │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │ ONNX Runtime    │       │
│                                         │  (C Library)    │       │
│                                         └─────────────────┘       │
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Component Architecture

1. HTTP Server Layer

┌─────────────────────────────────────────────────────────┐
│                    HTTP Server                           │
│                   (internal/server/)                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐        ┌──────────────┐              │
│  │  Middleware  │        │   Router     │              │
│  │  - Logging   │───────▶│  (Go 1.22+)  │              │
│  │  - Request ID│        │              │              │
│  │  - Recovery  │        └──────┬───────┘              │
│  └──────────────┘               │                       │
│                                  │                       │
│         ┌────────────────────────┴──────────────┐       │
│         │                        │              │       │
│    ┌────▼─────┐          ┌───────▼──────┐  ┌────▼───┐  │
│    │  Health  │          │   Model      │  │ Infer  │  │
│    │ Handlers │          │  Handlers    │  │Handler │  │
│    └──────────┘          └──────────────┘  └────────┘  │
│                                                         │
└─────────────────────────────────────────────────────────┘

Components: - Router: Uses Go 1.22+ standard library ServeMux with method matching. - Middleware: Structured logging, request tracing, and panic recovery. - Handlers: Business logic for health checks, model management, and inference.


2. Model Registry

The Model Registry maintains the lifecycle of all loaded models in memory.

Thread Safety: - Uses sync.RWMutex to allow concurrent inference while preventing conflicts during model loading/unloading. - Optimized for read-heavy workloads (inference).

Model Struct:

type Model struct {
    Name       string              // Model identifier
    Path       string              // File path to .onnx file
    Format     string              // "onnx"
    Session    *onnx.Session       // ONNX Runtime session
    InputInfo  []onnx.TensorInfo   // Input metadata
    OutputInfo []onnx.TensorInfo   // Output metadata
    LoadedAt   time.Time           // Load timestamp
}


3. ONNX Session Wrapper

GoServe interacts with the ONNX Runtime C library via CGO bindings.

Tensor Flow:

Go [][]float32 ──▶ Flatten ──▶ []float32
Create Tensor (Shape: [batch, features])
Pass to ONNX Runtime (CGO)
ONNX Runtime C API (Inference Execution)
Extract Output Tensor
Reshape ──▶ [][]float32 ──▶ Return to Go


Request Flow

Inference Request Flow

  1. Client sends a POST /v1/models/{model}/infer request with JSON data.
  2. HTTP Server routes the request and applies logging/tracing middleware.
  3. Inference Handler parses the JSON and validates the input shape.
  4. Model Registry retrieves the model session.
  5. ONNX Wrapper flattens the input, creates C tensors, and calls the ONNX Runtime C API.
  6. Results are extracted from output tensors, reshaped, and returned to the handler.
  7. Handler builds the final JSON response with predictions and probabilities.

Threading Model

GoServe leverages Go's native concurrency for high throughput.

  1. HTTP Server: Each request is handled in its own lightweight Goroutine.
  2. Registry Access: Thread-safe access via RWMutex ensures multiple concurrent inferences can proceed without blocking.
  3. ONNX Runtime: The underlying C library is thread-safe and executes inference in optimized background threads managed by the runtime.

Memory Management

Go vs C Memory

  • Go Memory (GC Managed): Handles HTTP request/response structs, JSON marshaling, and the Model Registry map.
  • C Memory (Manual): ONNX tensors and session state are allocated in C memory to avoid Go GC overhead during inference. GoServe explicitly destroys these objects after use to prevent memory leaks.

Security Considerations

  1. Input Validation: Strict checking of feature counts and data types before passing to C code.
  2. Resource Limits: Batch size limits and memory management to prevent denial-of-service.
  3. Path Traversal Protection: Validates model paths to ensure only authorized files are loaded.
  4. CGO Safety: Carefully managed boundaries between Go and C to prevent memory corruption.

For more details, see: - Full Technical Guide - Quick Start README - API Reference