← Back to all posts

GoModel, a LiteLLM alternative, is up to 14.5x faster

March 23 benchmark dashboard

On March 5, 2026, we published our first public benchmark: GoModel vs LiteLLM Benchmark: Speed, Throughput, and Resource Usage.

Since then, two things changed:

  1. GoModel development progressed.
  2. Our benchmark methodology progressed too.

So this is not a small edit to the old article. It is a new benchmark generation.

The short version: in this March 23, 2026 run, GoModel was between 2.1x and 14.5x faster than LiteLLM depending on workload, while using roughly 5.8x to 7.7x less memory.

What changed since the previous post

The March 5 post was useful, but it was narrower:

  • it focused on /v1/chat/completions,
  • it used a simpler comparison matrix,
  • it answered the question “how do these gateways behave in one practical setup?”

This new benchmark asks a different and tighter question:

If both gateways talk to the same instant localhost backend, how much latency and resource overhead does the gateway itself add?

That shift matters. It removes upstream model latency from the comparison and lets us look much more directly at gateway overhead.

It also reflects where GoModel is now on March 23, 2026. The codebase has moved forward since the earlier post, so treating the March 5 article as the final word would be misleading.

Methodology

This benchmark was run on March 23, 2026 on:

  • macOS Darwin 25.3.0
  • Apple Silicon (arm64)
  • Go 1.26.1
  • Python 3 with LiteLLM 1.82.0

Both gateways proxied to the same mock OpenAI-compatible backend on localhost. That mock server returned deterministic JSON and SSE payloads immediately, so the numbers here measure gateway overhead rather than provider latency.

Workloads

  • /v1/chat/completions, stream: false
  • /v1/chat/completions, stream: true
  • /v1/responses, stream: false
  • /v1/responses, stream: true

Test shape

  • 1,000 requests per benchmark
  • concurrency 50
  • 100 warm-up requests before GoModel runs
  • 50 warm-up requests before LiteLLM runs
  • hey for non-streaming benchmarks
  • a custom Go SSE benchmark tool for streaming benchmarks
  • RSS and CPU sampled every 0.5s

Controls

  • same backend for both gateways
  • retries disabled
  • logging disabled
  • analytics disabled
  • auth disabled

For chat workloads, we also recorded a direct baseline with no gateway in the middle.

Results at a glance

WorkloadGoModel req/sLiteLLM req/sSpeedupGoModel medianLiteLLM medianGoModel peak RSSLiteLLM peak RSS
Chat non-stream24,1292,20710.9x1.9 ms p5021.3 ms p5040.6 MB313.8 MB
Chat stream3,92938610.2x12.1 ms TTFB p50121.2 ms TTFB p5049.9 MB313.8 MB
Responses non-stream35,9772,48114.5x1.1 ms p5019.1 ms p5053.6 MB313.8 MB
Responses stream3,4701,6832.1x13.4 ms TTFB p5029.2 ms TTFB p5053.7 MB313.8 MB

The strongest result was 14.5x higher throughput on /v1/responses with stream: false.

Baseline overhead on chat traffic

Because we also benchmarked direct calls to the mock backend for chat traffic, we can estimate added gateway overhead:

TestDirect baselineGoModelLiteLLM
Chat non-stream p500.9 ms1.9 ms21.3 ms
Added latency vs baseline+1.0 ms+20.4 ms
Chat stream TTFB p506.4 ms12.1 ms121.2 ms
Added TTFB vs baseline+5.7 ms+114.8 ms

That is the core story of this run. GoModel stayed much closer to the direct baseline, especially on the chat path.

Charts

Dashboard

March 23 dashboard

Throughput

March 23 throughput chart

Median latency

March 23 latency chart

Memory

March 23 memory chart

Relative advantage

March 23 speedup chart

How this compares to the March 5 post

The earlier post from March 5, 2026 is still useful context, but it should now be read as an earlier snapshot:

  • the benchmark scope was narrower,
  • the methodology was less isolated,
  • GoModel itself was earlier in its development curve.

This March 23 benchmark is more explicit about what it measures: pure gateway overhead against a shared local backend, across both chat completions and the Responses API.

So the relationship between the two posts is:

  • March 5, 2026: first public benchmark snapshot
  • March 23, 2026: tighter overhead benchmark after further GoModel progress

Reproduce it yourself

The benchmark package lives in the GoModel GitHub repository.

For this March 23 benchmark refresh, the reproducible scripts are here:

Quick start:

git clone https://github.com/enterpilot/gomodel.git
cd gomodel

python3 -m pip install matplotlib numpy

# Generate charts from the existing March 23 result set
bash docs/2026-03-23_benchmark_scripts/run.sh

# Or rerun the raw benchmark first
RUN_BENCHMARK=1 bash docs/2026-03-23_benchmark_scripts/run.sh

The dated package produces:

  • normalized benchmark JSON,
  • chart images used in this post,
  • a stable entry point for rerunning the benchmark flow.

AI made this benchmark workflow much easier

One thing that is different in 2026 is how easy it has become to create a benchmark like this with AI help.

Peter Steinberger made this point well in Just Talk To It - the no-bs Way of Agentic Engineering: the practical shift is that you can now describe the outcome you want in plain language, then iterate quickly on the scripts, charts, and report structure.

That does not remove the need for rigor. You still need to inspect the raw data, verify the parsing, and make the methodology explicit. But it does compress the time from idea to a reproducible benchmark package dramatically.

An example prompt for this benchmark shape would be:

Prepare a benchmark report for me between GoModel and LiteLLM.

Requirements:
- use the same localhost mock backend for both gateways
- measure /v1/chat/completions and /v1/responses
- include both streaming and non-streaming runs
- report throughput, latency percentiles, TTFB for streaming, memory, and CPU
- generate charts and a Markdown report
- make the scripts reproducible inside the repo
- explain the methodology and caveats clearly

That kind of prompt is a much better starting point now than it would have been even a year ago.

Limits

This is still a point-in-time benchmark.

It does not prove that one gateway wins every deployment, every provider mix, or every traffic shape. It does show that, in this March 23, 2026 localhost overhead test, GoModel had a large throughput and memory advantage over LiteLLM.

That is a narrower claim than “always faster”, but it is also the more useful claim because it is precise.