GoModel vs LiteLLM Benchmark: Speed, Throughput, and Resource Usage

Update on March 23, 2026: we published a newer benchmark generation here: GoModel, a LiteLLM alternative, is up to 14.5x faster. That newer post reflects later GoModel development and a tighter localhost overhead methodology.
Why We Ran This Benchmark
We are the authors of GoModel, and we are proud of what the project has become. We ran this benchmark for two reasons:
- to test our product honestly against a widely used alternative,
- to find where we can still improve in the future.
When teams evaluate an AI gateway, they usually care about four things:
- extra latency
- throughput under concurrency
- CPU and RAM overhead
- failure behavior under pressure
This comparison was designed to measure exactly those points.
Methodology
- OpenAI-compatible endpoint:
/v1/chat/completions - Same request shape and prompt for both gateways
- Concurrency levels:
1,4,8 - Primary run: error-free matrix based on the original benchmark results
- Metrics: req/s, p95/p99 latency, CPU avg/max, RSS avg/max
Primary Results
| Gateway | C | Success | Error % | Req/s | p50 ms | p95 ms | p99 ms | CPU avg % | RSS avg MB |
|---|---|---|---|---|---|---|---|---|---|
| GoModel | 1 | 12/12 | 0.00 | 9.61 | 86.4 | 141.1 | 144.4 | 0.81 | 45.4 |
| GoModel | 4 | 12/12 | 0.00 | 44.66 | 56.1 | 139.5 | 139.5 | 0.23 | 46.0 |
| GoModel | 8 | 12/12 | 0.00 | 52.75 | 98.4 | 130.6 | 131.1 | 1.13 | 46.0 |
| LiteLLM | 1 | 12/12 | 0.00 | 8.64 | 96.2 | 190.3 | 213.9 | 9.21 | 320.3 |
| LiteLLM | 4 | 12/12 | 0.00 | 36.82 | 104.7 | 149.5 | 149.5 | 5.20 | 320.8 |
| LiteLLM | 8 | 12/12 | 0.00 | 35.81 | 188.7 | 244.4 | 244.9 | 5.95 | 321.5 |
Charts

Throughput

Latency (p95)

Memory (RSS avg)

CPU (avg)

Interpretation
- Latency: GoModel was consistently faster across the tested scenarios.
- Throughput: GoModel delivered higher req/s in this matrix.
- Memory footprint: GoModel used significantly less RSS memory.
- CPU usage: GoModel remained lighter on CPU in these runs.
Reproduce It Yourself
The benchmark tooling is open-source in the GoModel repository under
docs/about/benchmark-tools/.
Prerequisites: Go 1.25+, Python 3.10+, jq, curl, a Groq API key, and litellm[proxy].
# Clone GoModel and add your API key
git clone https://github.com/enterpilot/gomodel.git
cd gomodel
echo "GROQ_API_KEY=gsk_..." > .env
# Run the full comparison
bash docs/about/benchmark-tools/compare.sh
# Generate charts
pip install matplotlib numpy
python3 docs/about/benchmark-tools/plot_benchmark_charts.py benchmark-results/<timestamp>
The script builds GoModel from source, starts both gateways locally, runs the
request matrix at concurrency 1, 4, and 8, collects latency and process
metrics, and writes JSON results plus a REPORT.md.
Override defaults with environment variables:
REQUESTS=100 CONCURRENCIES="1 4 8 16" bash docs/about/benchmark-tools/compare.sh
Final Note
We did not publish this to celebrate a snapshot and stop there. We published it to keep pressure on ourselves and keep making GoModel better.
If you run this benchmark and get different results, we want to see your setup and data.