Developer Notes: Building a Generic ONNX Engine in Go

Date: January 15, 2026

When we first started GoServe, we built it specifically for a Fraud Detection model. It was fast, but it was "locked-in." To make GoServe a true platform, we had to solve the challenge of supporting any model with any input type. This note dives into the technical hurdles and how we overcame them.

The Challenge: Go Generics vs. C Strong Typing

The core of GoServe is the onnxruntime_go wrapper. This library is excellent, but it uses Go generics (e.g., NewTensor[float32]).

While generics are great for compile-time safety, they are a nightmare for a dynamic server. A server doesn't know if the user is uploading a model that expects float32, int64, or double until the request arrives.

The "Reflection" Bridge

To support generic inputs, we had to move away from hardcoded types and use Go's reflect package.

How it works: 1. We accept the input as a map[string]any. 2. We use reflect.Value to inspect the incoming data (checking if it's a slice, identifying its depth). 3. We match the incoming data against the Introspected Metadata of the model (see below). 4. We recursively flatten the multi-dimensional Go slices into a contiguous C-friendly buffer while performing type conversion.

Overcoming "Black Box" Models: Automated Introspection

Initially, GoServe was built specifically for a single model with hardcoded input names. To make GoServe a true platform, we implemented Automated Introspection.

When a model is loaded, GoServe now calls the C API's GetInputOutputInfo. We query: - Names: e.g., "data", "output.1". - Shapes: e.g., [1, 3, 224, 224]. - Data Types: e.g., FLOAT, INT64.

This metadata is stored in our Model Registry and used to validate every single request before it ever touches the C library.

The "Rank" Headache

One of our biggest "Gotchas" was tensor Rank (dimensions). - A simple tabular model might return a shape of [Batch, 2]. - An image classifier like MobileNet returns a complex 4D input and high-dimensional outputs.

Our extraction logic initially assumed everything was 2D. We refactored our input flattener and output extractor to dynamically calculate the rank and reshape the flat C memory buffer into the correct nested Go structures.

Current Limitations & Trade-offs

Reflection Overhead: Using reflect adds about 5-10 microseconds of latency. In the world of high-performance Go, this is "slow," but compared to the 25ms+ of ML inference, it is a rounding error (0.04% overhead).
Memory Copying: To pass data to C, we must flatten Go slices into a contiguous block of memory. For massive batches, this CPU-bound copying becomes a bottleneck.
Type Coverage: Currently, we only support FLOAT32 and INT64. Support for STRING and DOUBLE is on the roadmap but requires more complex CGO handling.

Future Path: Zero-Copy Inference

Our ultimate goal is Zero-Copy. We want to allow users to send binary data (Protobuf/FlatBuffers) that we can point the ONNX Runtime directly at, bypassing Go's memory management entirely. This would make GoServe the fastest possible way to serve models on the planet.

Follow the progress on GitHub