Developer Notes: Building a Generic ONNX Engine in Go
Date: January 15, 2026
When we first started GoServe, we built it specifically for a Fraud Detection model. It was fast, but it was "locked-in." To make GoServe a true platform, we had to solve the challenge of supporting any model with any input type. This note dives into the technical hurdles and how we overcame them.
The Challenge: Go Generics vs. C Strong Typing
The core of GoServe is the onnxruntime_go wrapper. This library is excellent, but it uses Go generics (e.g., NewTensor[float32]).
While generics are great for compile-time safety, they are a nightmare for a dynamic server. A server doesn't know if the user is uploading a model that expects float32, int64, or double until the request arrives.
The "Reflection" Bridge
To support generic inputs, we had to move away from hardcoded types and use Go's reflect package.
How it works:
1. We accept the input as a map[string]any.
2. We use reflect.Value to inspect the incoming data (checking if it's a slice, identifying its depth).
3. We match the incoming data against the Introspected Metadata of the model (see below).
4. We recursively flatten the multi-dimensional Go slices into a contiguous C-friendly buffer while performing type conversion.
Overcoming "Black Box" Models: Automated Introspection
Initially, GoServe was built specifically for a single model with hardcoded input names. To make GoServe a true platform, we implemented Automated Introspection.
When a model is loaded, GoServe now calls the C API's GetInputOutputInfo. We query:
- Names: e.g., "data", "output.1".
- Shapes: e.g., [1, 3, 224, 224].
- Data Types: e.g., FLOAT, INT64.
This metadata is stored in our Model Registry and used to validate every single request before it ever touches the C library.
The "Rank" Headache
One of our biggest "Gotchas" was tensor Rank (dimensions).
- A simple tabular model might return a shape of [Batch, 2].
- An image classifier like MobileNet returns a complex 4D input and high-dimensional outputs.
Our extraction logic initially assumed everything was 2D. We refactored our input flattener and output extractor to dynamically calculate the rank and reshape the flat C memory buffer into the correct nested Go structures.
Current Limitations & Trade-offs
- Reflection Overhead: Using
reflectadds about 5-10 microseconds of latency. In the world of high-performance Go, this is "slow," but compared to the 25ms+ of ML inference, it is a rounding error (0.04% overhead). - Memory Copying: To pass data to C, we must flatten Go slices into a contiguous block of memory. For massive batches, this CPU-bound copying becomes a bottleneck.
- Type Coverage: Currently, we only support
FLOAT32andINT64. Support forSTRINGandDOUBLEis on the roadmap but requires more complex CGO handling.
Future Path: Zero-Copy Inference
Our ultimate goal is Zero-Copy. We want to allow users to send binary data (Protobuf/FlatBuffers) that we can point the ONNX Runtime directly at, bypassing Go's memory management entirely. This would make GoServe the fastest possible way to serve models on the planet.
Follow the progress on GitHub