← Back to all posts

GoModel vs LiteLLM Benchmark: Speed, Throughput, and Resource Usage

Benchmark dashboard

Update on March 23, 2026: we published a newer benchmark generation here: GoModel, a LiteLLM alternative, is up to 14.5x faster. That newer post reflects later GoModel development and a tighter localhost overhead methodology.

Why We Ran This Benchmark

We are the authors of GoModel, and we are proud of what the project has become. We ran this benchmark for two reasons:

  1. to test our product honestly against a widely used alternative,
  2. to find where we can still improve in the future.

When teams evaluate an AI gateway, they usually care about four things:

  1. extra latency
  2. throughput under concurrency
  3. CPU and RAM overhead
  4. failure behavior under pressure

This comparison was designed to measure exactly those points.

Methodology

  • OpenAI-compatible endpoint: /v1/chat/completions
  • Same request shape and prompt for both gateways
  • Concurrency levels: 1, 4, 8
  • Primary run: error-free matrix based on the original benchmark results
  • Metrics: req/s, p95/p99 latency, CPU avg/max, RSS avg/max

Primary Results

GatewayCSuccessError %Req/sp50 msp95 msp99 msCPU avg %RSS avg MB
GoModel112/120.009.6186.4141.1144.40.8145.4
GoModel412/120.0044.6656.1139.5139.50.2346.0
GoModel812/120.0052.7598.4130.6131.11.1346.0
LiteLLM112/120.008.6496.2190.3213.99.21320.3
LiteLLM412/120.0036.82104.7149.5149.55.20320.8
LiteLLM812/120.0035.81188.7244.4244.95.95321.5

Charts

Benchmark dashboard

Throughput

Throughput chart

Latency (p95)

Latency chart

Memory (RSS avg)

Memory chart

CPU (avg)

CPU chart

Interpretation

  • Latency: GoModel was consistently faster across the tested scenarios.
  • Throughput: GoModel delivered higher req/s in this matrix.
  • Memory footprint: GoModel used significantly less RSS memory.
  • CPU usage: GoModel remained lighter on CPU in these runs.

Reproduce It Yourself

The benchmark tooling is open-source in the GoModel repository under docs/about/benchmark-tools/.

Prerequisites: Go 1.25+, Python 3.10+, jq, curl, a Groq API key, and litellm[proxy].

# Clone GoModel and add your API key
git clone https://github.com/enterpilot/gomodel.git
cd gomodel
echo "GROQ_API_KEY=gsk_..." > .env

# Run the full comparison
bash docs/about/benchmark-tools/compare.sh

# Generate charts
pip install matplotlib numpy
python3 docs/about/benchmark-tools/plot_benchmark_charts.py benchmark-results/<timestamp>

The script builds GoModel from source, starts both gateways locally, runs the request matrix at concurrency 1, 4, and 8, collects latency and process metrics, and writes JSON results plus a REPORT.md.

Override defaults with environment variables:

REQUESTS=100 CONCURRENCIES="1 4 8 16" bash docs/about/benchmark-tools/compare.sh

Final Note

We did not publish this to celebrate a snapshot and stop there. We published it to keep pressure on ourselves and keep making GoModel better.

If you run this benchmark and get different results, we want to see your setup and data.