Benchmarks & Resource Efficiency
GoServe is engineered for high-performance production environments. To provide an honest comparison, we benchmarked GoServe against a production-tuned FastAPI setup using the MobileNetV2 image classification model.
Methodology
- Hardware: Local CPU-based inference.
- Model: MobileNetV2 (ONNX,
1x3x224x224input). - Concurrency: 10 concurrent clients, 100 total requests.
- FastAPI Setup: 4-worker multi-process configuration using
ProcessPoolExecutorto bypass the GIL. This setup mimics the standard industry workaround for offloading CPU-bound tasks in Python API servers. - GoServe Setup: Single-process Go binary using native Goroutines.
Performance Results
| Metric | GoServe | FastAPI (4 Workers) | GoServe Advantage |
|---|---|---|---|
| Throughput | 31.6 req/s | 4.5 req/s | 7x faster |
| P50 Latency | 310.1 ms | 2126.4 ms | 6.8x lower |
| P99 Latency | 410.8 ms | 2150.0 ms | 5.2x lower |
Why is there such a large gap?
The 7x performance lead is a result of a fundamental Architectural Advantage, especially when handling Data-Heavy Payloads:
- Shared Memory vs. IPC: In Python, data must be serialized (pickled) and sent between processes to utilize multiple cores. For large image tensors, this "IPC Tax" is massive. GoServe uses Goroutines within a single process, meaning all "workers" share the same memory address.
- JSON Parsing Efficiency: Go's static typing allows it to unmarshal JSON directly into contiguous memory buffers. Python's dynamic nature and Pydantic validation add significant overhead when processing large arrays.
- Data Intensity Note: This performance gap is most magnified during Inference because of the heavy JSON arrays involved. For simple string-based "Hello World" tasks, the gap would be smaller; Go's advantage scales linearly with the intensity of the data.
Resource Footprint (The "Cloud Bill" Saver)
Efficiency is where GoServe truly shines for DevOps and Finance teams.
| State | GoServe (1 Process) | FastAPI (4 Workers) | RAM Savings |
|---|---|---|---|
| Idle | ~105 MB | ~400 MB | 73% less |
| Active | ~112 MB | ~600 MB+ | 81% less |
Why Go? (Comparison with Java)
While Java is a powerful high-performance language, we chose Go for GoServe for several strategic reasons:
1. The "Glue" Overhead
ML Serving is mostly "glue code"—taking JSON from a socket and handing a pointer to a C++ library (ONNX Runtime). - Go: CGO is designed exactly for this. It allows Go to pass memory pointers to C with extremely low overhead. - Java: Requires JNI (Java Native Interface) or the newer Panama API. JNI is notoriously complex to maintain and has higher overhead for frequent "small" calls compared to CGO.
2. Memory & Startup
- Footprint: A Go binary is a single, static file (~20MB). A Java application requires a JVM (Java Virtual Machine), which usually has a 256MB+ memory "floor" just to start.
- Cold Starts: GoServe starts in milliseconds. Standard Java (JVM) requires a "warm-up" period for the JIT compiler to reach peak performance, making it less ideal for serverless or scale-to-zero architectures.
3. Performance Difference
In terms of raw inference (once the data is in the C++ library), Go and Java would perform similarly. However, Go's superior handling of lightweight concurrency (Goroutines vs. heavy OS threads) and its lower memory footprint make it more cost-effective for high-density deployment.
Technical Note on Fairness
A "Pure Threaded" FastAPI setup might reduce the IPC overhead seen in these results, but would be highly susceptible to GIL contention under mixed workloads. GoServe provides the best of both worlds: the safety of a single process with the performance of true multi-core parallelism.