Benchmarks & Resource Efficiency

GoServe is engineered for high-performance production environments. To provide an honest comparison, we benchmarked GoServe against a production-tuned FastAPI setup using the MobileNetV2 image classification model.

Methodology

Hardware: Local CPU-based inference.
Model: MobileNetV2 (ONNX, 1x3x224x224 input).
Concurrency: 10 concurrent clients, 100 total requests.
FastAPI Setup: 4-worker multi-process configuration using ProcessPoolExecutor to bypass the GIL. This setup mimics the standard industry workaround for offloading CPU-bound tasks in Python API servers.
GoServe Setup: Single-process Go binary using native Goroutines.

Performance Results

Metric	GoServe	FastAPI (4 Workers)	GoServe Advantage
Throughput	31.6 req/s	4.5 req/s	7x faster
P50 Latency	310.1 ms	2126.4 ms	6.8x lower
P99 Latency	410.8 ms	2150.0 ms	5.2x lower

Why is there such a large gap?

The 7x performance lead is a result of a fundamental Architectural Advantage, especially when handling Data-Heavy Payloads:

Shared Memory vs. IPC: In Python, data must be serialized (pickled) and sent between processes to utilize multiple cores. For large image tensors, this "IPC Tax" is massive. GoServe uses Goroutines within a single process, meaning all "workers" share the same memory address.
JSON Parsing Efficiency: Go's static typing allows it to unmarshal JSON directly into contiguous memory buffers. Python's dynamic nature and Pydantic validation add significant overhead when processing large arrays.
Data Intensity Note: This performance gap is most magnified during Inference because of the heavy JSON arrays involved. For simple string-based "Hello World" tasks, the gap would be smaller; Go's advantage scales linearly with the intensity of the data.

Resource Footprint (The "Cloud Bill" Saver)

Efficiency is where GoServe truly shines for DevOps and Finance teams.

State	GoServe (1 Process)	FastAPI (4 Workers)	RAM Savings
Idle	~105 MB	~400 MB	73% less
Active	~112 MB	~600 MB+	81% less

Why Go? (Comparison with Java)

While Java is a powerful high-performance language, we chose Go for GoServe for several strategic reasons:

1. The "Glue" Overhead

ML Serving is mostly "glue code"—taking JSON from a socket and handing a pointer to a C++ library (ONNX Runtime). - Go: CGO is designed exactly for this. It allows Go to pass memory pointers to C with extremely low overhead. - Java: Requires JNI (Java Native Interface) or the newer Panama API. JNI is notoriously complex to maintain and has higher overhead for frequent "small" calls compared to CGO.

2. Memory & Startup

Footprint: A Go binary is a single, static file (~20MB). A Java application requires a JVM (Java Virtual Machine), which usually has a 256MB+ memory "floor" just to start.
Cold Starts: GoServe starts in milliseconds. Standard Java (JVM) requires a "warm-up" period for the JIT compiler to reach peak performance, making it less ideal for serverless or scale-to-zero architectures.

3. Performance Difference

In terms of raw inference (once the data is in the C++ library), Go and Java would perform similarly. However, Go's superior handling of lightweight concurrency (Goroutines vs. heavy OS threads) and its lower memory footprint make it more cost-effective for high-density deployment.

Technical Note on Fairness

A "Pure Threaded" FastAPI setup might reduce the IPC overhead seen in these results, but would be highly susceptible to GIL contention under mixed workloads. GoServe provides the best of both worlds: the safety of a single process with the performance of true multi-core parallelism.