GoModel, a LiteLLM alternative, is up to 14.5x faster

On March 5, 2026, we published our first public benchmark: GoModel vs LiteLLM Benchmark: Speed, Throughput, and Resource Usage.
Since then, two things changed:
- GoModel development progressed.
- Our benchmark methodology progressed too.
So this is not a small edit to the old article. It is a new benchmark generation.
The short version: in this March 23, 2026 run, GoModel was between 2.1x and
14.5x faster than LiteLLM depending on workload, while using roughly 5.8x
to 7.7x less memory.
What changed since the previous post
The March 5 post was useful, but it was narrower:
- it focused on
/v1/chat/completions, - it used a simpler comparison matrix,
- it answered the question “how do these gateways behave in one practical setup?”
This new benchmark asks a different and tighter question:
If both gateways talk to the same instant localhost backend, how much latency and resource overhead does the gateway itself add?
That shift matters. It removes upstream model latency from the comparison and lets us look much more directly at gateway overhead.
It also reflects where GoModel is now on March 23, 2026. The codebase has moved forward since the earlier post, so treating the March 5 article as the final word would be misleading.
Methodology
This benchmark was run on March 23, 2026 on:
- macOS Darwin
25.3.0 - Apple Silicon (
arm64) - Go
1.26.1 - Python 3 with LiteLLM
1.82.0
Both gateways proxied to the same mock OpenAI-compatible backend on localhost. That mock server returned deterministic JSON and SSE payloads immediately, so the numbers here measure gateway overhead rather than provider latency.
Workloads
/v1/chat/completions,stream: false/v1/chat/completions,stream: true/v1/responses,stream: false/v1/responses,stream: true
Test shape
1,000requests per benchmark- concurrency
50 100warm-up requests before GoModel runs50warm-up requests before LiteLLM runsheyfor non-streaming benchmarks- a custom Go SSE benchmark tool for streaming benchmarks
- RSS and CPU sampled every
0.5s
Controls
- same backend for both gateways
- retries disabled
- logging disabled
- analytics disabled
- auth disabled
For chat workloads, we also recorded a direct baseline with no gateway in the middle.
Results at a glance
| Workload | GoModel req/s | LiteLLM req/s | Speedup | GoModel median | LiteLLM median | GoModel peak RSS | LiteLLM peak RSS |
|---|---|---|---|---|---|---|---|
| Chat non-stream | 24,129 | 2,207 | 10.9x | 1.9 ms p50 | 21.3 ms p50 | 40.6 MB | 313.8 MB |
| Chat stream | 3,929 | 386 | 10.2x | 12.1 ms TTFB p50 | 121.2 ms TTFB p50 | 49.9 MB | 313.8 MB |
| Responses non-stream | 35,977 | 2,481 | 14.5x | 1.1 ms p50 | 19.1 ms p50 | 53.6 MB | 313.8 MB |
| Responses stream | 3,470 | 1,683 | 2.1x | 13.4 ms TTFB p50 | 29.2 ms TTFB p50 | 53.7 MB | 313.8 MB |
The strongest result was 14.5x higher throughput on /v1/responses with
stream: false.
Baseline overhead on chat traffic
Because we also benchmarked direct calls to the mock backend for chat traffic, we can estimate added gateway overhead:
| Test | Direct baseline | GoModel | LiteLLM |
|---|---|---|---|
| Chat non-stream p50 | 0.9 ms | 1.9 ms | 21.3 ms |
| Added latency vs baseline | — | +1.0 ms | +20.4 ms |
| Chat stream TTFB p50 | 6.4 ms | 12.1 ms | 121.2 ms |
| Added TTFB vs baseline | — | +5.7 ms | +114.8 ms |
That is the core story of this run. GoModel stayed much closer to the direct baseline, especially on the chat path.
Charts
Dashboard

Throughput

Median latency

Memory

Relative advantage

How this compares to the March 5 post
The earlier post from March 5, 2026 is still useful context, but it should now be read as an earlier snapshot:
- the benchmark scope was narrower,
- the methodology was less isolated,
- GoModel itself was earlier in its development curve.
This March 23 benchmark is more explicit about what it measures: pure gateway overhead against a shared local backend, across both chat completions and the Responses API.
So the relationship between the two posts is:
- March 5, 2026: first public benchmark snapshot
- March 23, 2026: tighter overhead benchmark after further GoModel progress
Reproduce it yourself
The benchmark package lives in the GoModel GitHub repository.
For this March 23 benchmark refresh, the reproducible scripts are here:
docs/2026-03-23_benchmark_scripts/- benchmark runner workspace:
docs/2026-03-23_benchmark_scripts/gateway-comparison/
Quick start:
git clone https://github.com/enterpilot/gomodel.git
cd gomodel
python3 -m pip install matplotlib numpy
# Generate charts from the existing March 23 result set
bash docs/2026-03-23_benchmark_scripts/run.sh
# Or rerun the raw benchmark first
RUN_BENCHMARK=1 bash docs/2026-03-23_benchmark_scripts/run.sh
The dated package produces:
- normalized benchmark JSON,
- chart images used in this post,
- a stable entry point for rerunning the benchmark flow.
AI made this benchmark workflow much easier
One thing that is different in 2026 is how easy it has become to create a benchmark like this with AI help.
Peter Steinberger made this point well in Just Talk To It - the no-bs Way of Agentic Engineering: the practical shift is that you can now describe the outcome you want in plain language, then iterate quickly on the scripts, charts, and report structure.
That does not remove the need for rigor. You still need to inspect the raw data, verify the parsing, and make the methodology explicit. But it does compress the time from idea to a reproducible benchmark package dramatically.
An example prompt for this benchmark shape would be:
Prepare a benchmark report for me between GoModel and LiteLLM.
Requirements:
- use the same localhost mock backend for both gateways
- measure /v1/chat/completions and /v1/responses
- include both streaming and non-streaming runs
- report throughput, latency percentiles, TTFB for streaming, memory, and CPU
- generate charts and a Markdown report
- make the scripts reproducible inside the repo
- explain the methodology and caveats clearly
That kind of prompt is a much better starting point now than it would have been even a year ago.
Limits
This is still a point-in-time benchmark.
It does not prove that one gateway wins every deployment, every provider mix, or every traffic shape. It does show that, in this March 23, 2026 localhost overhead test, GoModel had a large throughput and memory advantage over LiteLLM.
That is a narrower claim than “always faster”, but it is also the more useful claim because it is precise.