Skip to main content

Portkey vs LiteLLM vs OpenRouter: LLM Gateway 2026

·PkgPulse Team
0

Portkey vs LiteLLM vs OpenRouter: LLM Gateway Comparison 2026

TL;DR

Managing multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral — is complex: different SDKs, different pricing, different reliability. LiteLLM is the open-source Python proxy that gives you one OpenAI-compatible API for 100+ LLMs — self-host it and route anywhere. Portkey is the enterprise-grade AI gateway with production features (semantic caching, guardrails, advanced observability) as a managed service or self-hosted. OpenRouter is the SaaS marketplace model — one API key, access to 200+ models, pay per token with their routing, no infrastructure to manage. For enterprise production teams: Portkey. For self-hosted infrastructure control: LiteLLM. For instant multi-model access with zero setup: OpenRouter.

Key Takeaways

  • LiteLLM supports 100+ LLM providers — all via an OpenAI-compatible API (/chat/completions)
  • Portkey offers semantic caching — cache similar (not just identical) prompts, reducing costs by up to 40%
  • OpenRouter has 200+ models — including models not available via direct API (some fine-tuned, some obscure)
  • LiteLLM is fully open source (MIT) — self-host on your own infra with full data control
  • Portkey GitHub stars: ~8k — fastest-growing enterprise AI gateway
  • All three support fallbacks — if GPT-4o fails, automatically retry with Claude Sonnet
  • LiteLLM's proxy adds ~10-20ms latency — acceptable for most use cases

The Multi-LLM Problem

In 2026, using a single LLM provider is risky:

  • Rate limits — OpenAI's 429 errors during peak hours kill production apps
  • Outages — Any provider can go down; no fallback = 100% downtime
  • Cost optimization — Route simple queries to cheaper models (GPT-4o Mini, Claude Haiku)
  • SDK fragmentation — OpenAI, Anthropic, and Google each have their own SDKs
  • Observability gaps — No unified view of costs, latency, and errors across providers

LLM gateways solve all of this with a single unified API.


OpenRouter: Marketplace for 200+ Models

OpenRouter is the simplest option — sign up, get one API key, access 200+ models. They handle routing, load balancing, and fallbacks. You pay OpenRouter per token at competitive rates.

Zero-Config Setup

import OpenAI from "openai";

// OpenRouter is OpenAI-API-compatible — just change the baseURL
const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": "https://yourapp.com",  // Required by OpenRouter
    "X-Title": "Your App Name",
  },
});

// Now use any model via unified API
async function chat(prompt: string, model: string) {
  const response = await client.chat.completions.create({
    model,  // Any of 200+ supported models
    messages: [{ role: "user", content: prompt }],
  });
  return response.choices[0].message.content;
}

// Use GPT-4o
const gpt4Response = await chat("Explain quantum computing", "openai/gpt-4o");

// Use Claude Sonnet
const claudeResponse = await chat("Write a poem", "anthropic/claude-sonnet-4-5");

// Use Gemini 1.5 Pro
const geminiResponse = await chat("Analyze this data", "google/gemini-pro-1.5");

// Use Llama 3.3 (open source, cheap)
const llamaResponse = await chat("Summarize this", "meta-llama/llama-3.3-70b-instruct");

Model Routing and Fallbacks

// OpenRouter auto-routes when you specify multiple models (beta feature)
const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: prompt }],
  // Route config via extra_body
  extra_body: {
    route: "fallback",
    models: [
      "openai/gpt-4o",
      "anthropic/claude-sonnet-4-5",
      "google/gemini-pro-1.5",
    ],
  },
});

Cost Control with Model Selection

// Route based on task complexity to control costs
type TaskType = "simple" | "complex" | "creative";

const MODEL_MAP: Record<TaskType, string> = {
  simple: "openai/gpt-4o-mini",            // ~$0.15 / 1M input tokens
  complex: "openai/gpt-4o",                // ~$2.50 / 1M input tokens
  creative: "anthropic/claude-sonnet-4-5", // ~$3.00 / 1M input tokens
};

async function intelligentChat(prompt: string, taskType: TaskType) {
  const model = MODEL_MAP[taskType];
  const response = await client.chat.completions.create({
    model,
    messages: [{ role: "user", content: prompt }],
  });
  return { response: response.choices[0].message.content, model };
}

OpenRouter Pricing Overview

Model                          Input ($/1M)   Output ($/1M)
openai/gpt-4o                    $2.50          $10.00
openai/gpt-4o-mini               $0.15           $0.60
anthropic/claude-sonnet-4-5      $3.00          $15.00
anthropic/claude-haiku-3-5       $0.80           $4.00
google/gemini-pro-1.5            $1.25           $5.00
meta-llama/llama-3.3-70b         $0.065          $0.10
mistralai/mistral-large          $2.00           $6.00

OpenRouter's markup over direct provider pricing is typically 5-15%, which is the cost of the managed infrastructure they provide. For most teams, this is a reasonable tradeoff — you don't need to maintain a gateway, and you get automatic fallback across models when a provider has an outage. The access to open-source models (Llama, Mistral, Qwen) at near-cost pricing is genuinely useful for cost-sensitive use cases.

The limitations of OpenRouter are worth understanding before committing. Since OpenRouter is a US-based company, all requests route through their infrastructure. Teams with GDPR compliance requirements or data residency obligations that prohibit US data processing cannot use OpenRouter for user data. For these teams, LiteLLM's self-hosted option or Portkey's EU data residency offering are the alternatives. OpenRouter also doesn't offer custom model deployment — you're limited to models they've integrated, which covers the major providers well but excludes fine-tuned or proprietary models your organization might have deployed internally.


LiteLLM: Open-Source Universal Proxy

LiteLLM is a Python library AND proxy server that translates any LLM call to OpenAI format. Self-host the proxy and get unified routing, load balancing, cost tracking, and fallbacks — with full data ownership.

LiteLLM Python Library (No Server)

import litellm

# Set provider keys
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."

# Universal interface — same call for any model
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Claude — identical interface
response = litellm.completion(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Gemini
response = litellm.completion(
    model="gemini/gemini-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

LiteLLM Proxy Server Setup

# litellm-config.yaml — define your models and routing
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-pro
      api_key: os.environ/GEMINI_API_KEY

  # Load-balanced group — random/least-latency routing
  - model_name: best-available
    litellm_params:
      model: openai/gpt-4o
    model_info:
      base_model: openai/gpt-4o

router_settings:
  routing_strategy: "least-busy"
  num_retries: 3
  timeout: 30

litellm_settings:
  success_callback: ["langfuse"]  # Observability integration
  failure_callback: ["langfuse"]
  cache:
    type: "redis"
    host: "redis"
    port: 6379
# Start the proxy
litellm --config litellm-config.yaml --port 4000

# Or via Docker
docker run -p 4000:4000 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -v ./litellm-config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main \
  --config /app/config.yaml

Calling LiteLLM Proxy from Node.js

import OpenAI from "openai";

// Point to your LiteLLM proxy — OpenAI-compatible
const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",  // Your LiteLLM proxy
  apiKey: "sk-1234",  // Virtual key from LiteLLM
});

// All these use the proxy's routing
const gpt4 = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

const claude = await client.chat.completions.create({
  model: "claude-sonnet",  // Maps to anthropic/claude-sonnet-4-5 via config
  messages: [{ role: "user", content: "Hello" }],
});

Fallbacks and Load Balancing

# litellm-config.yaml — fallback configuration
router_settings:
  # Primary → fallback chain
  fallbacks:
    - gpt-4o: ["claude-sonnet", "gemini-pro"]
    - claude-sonnet: ["gpt-4o", "gemini-pro"]

  # Retry on specific errors
  retry_policy:
    BadRequestError: 0     # Don't retry 400s
    RateLimitError: 3      # Retry rate limits 3 times
    TimeoutError: 2        # Retry timeouts twice
    ServiceUnavailableError: 3

  # Context window fallback — if prompt too long, use bigger model
  context_window_fallbacks:
    - gpt-4o: ["gpt-4o-128k", "claude-opus-3-5"]

Cost Tracking and Budget Limits

# litellm-config.yaml — per-team budget controls
general_settings:
  master_key: "sk-master"

# Virtual keys with budget limits
virtual_keys:
  - virtual_key: "sk-team-a"
    max_budget: 100.00   # $100 limit
    budget_duration: "monthly"
    model_access: ["gpt-4o-mini", "claude-haiku"]

  - virtual_key: "sk-team-b"
    max_budget: 500.00
    budget_duration: "monthly"
    model_access: ["gpt-4o", "claude-sonnet"]

Portkey: Enterprise AI Gateway

Portkey is the most feature-rich option — designed for enterprise production with semantic caching, guardrails, advanced observability, and AI config management. Available as managed cloud or self-hosted.

Setup (TypeScript SDK)

npm install portkey-ai
import Portkey from "portkey-ai";

const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,  // Provider key managed in Portkey vault
});

// OpenAI-compatible interface
const response = await portkey.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Configs — Define Routing Logic in Dashboard

// Reference a config by ID — routing logic managed in Portkey dashboard
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  config: "pc-production-fallback-abc123",  // Config defined in dashboard
});

// Config can include: fallbacks, load balancing, caching, guardrails
// without changing your code
const response = await portkey.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Inline Config (No Dashboard Required)

import { createConfig } from "portkey-ai";

const response = await portkey.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "What is 2+2?" }],
  },
  {
    config: createConfig({
      // Fallback chain
      strategy: { mode: "fallback" },
      targets: [
        { virtualKey: process.env.OPENAI_VIRTUAL_KEY },
        { virtualKey: process.env.ANTHROPIC_VIRTUAL_KEY, overrideParams: { model: "claude-sonnet-4-5" } },
        { virtualKey: process.env.GEMINI_VIRTUAL_KEY, overrideParams: { model: "gemini-pro" } },
      ],
    }),
  }
);

Semantic Caching

// Portkey's unique semantic cache — matches similar prompts, not just identical
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,
  cache: {
    mode: "semantic",
    maxAge: 86400,  // 24 hours
    forceRefresh: false,
  },
});

// First call: "What's the weather like in Paris?" → calls OpenAI, caches result
// Second call: "How's the weather in Paris today?" → semantic match → returns cached result (0 tokens)
// Cache hit rate on production apps: typically 30-50% with semantic matching

Guardrails

// Portkey guardrails — input/output validation and transformation
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,
});

const response = await portkey.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: userInput }],
  },
  {
    config: createConfig({
      guardrails: [
        {
          type: "input",
          checks: [
            { id: "portkey.prompt_injection", on_fail: "block" },  // Block prompt injection
            { id: "portkey.pii_detection", on_fail: "anonymize" }, // Anonymize PII
          ],
        },
        {
          type: "output",
          checks: [
            { id: "portkey.toxicity", on_fail: "censor" },          // Censor toxic output
          ],
        },
      ],
    }),
  }
);

Feature Comparison

FeatureOpenRouterLiteLLMPortkey
HostingSaaS onlySelf-hosted (OSS)Cloud or self-hosted
Setup time<5 min30-60 min (self-hosted)15-30 min
Model count200+100+ providers50+ providers
OpenAI compatible API
Provider fallbacksPartial
Load balancingBasic✅ Advanced✅ Advanced
Semantic cachingBasic (Redis)✅ Purpose-built
Cost tracking✅ Basic✅ Per-team budgets✅ Advanced
Guardrails
Prompt versioning
Data residency❌ (US servers)✅ Full control✅ Self-hosted option
Open source✅ MITPartial
GitHub stars~2k~15k~8k
Free tierPay per tokenOpen source10k requests/mo
Enterprise supportCommunity

Ecosystem and Community

LiteLLM is the most-starred LLM gateway on GitHub with over 15k stars and is maintained by a dedicated team. The community is primarily Python-centric, with extensive documentation for integrations with Langfuse, Helicone, Langsmith, and every major LLM provider. The GitHub Discussions are active and the maintainers respond quickly to issues. LiteLLM has also spawned a managed cloud version (LiteLLM Proxy Cloud) for teams that want the open-source functionality without self-hosting. The Python library is used directly in hundreds of AI application frameworks as an abstraction layer.

OpenRouter has built a unique position as the neutral model marketplace. Its website (openrouter.ai/models) is one of the most comprehensive public databases of LLM pricing, context windows, and capabilities. The OpenRouter community Discord is active and provides a space for developers to discuss model performance and routing strategies. The platform is a frequent target for integration — AI application frameworks like LangChain, LlamaIndex, and Haystack all support OpenRouter as a provider, which has accelerated its adoption.

Portkey's community is smaller but growing rapidly, particularly in enterprise AI engineering circles. The company publishes detailed content on AI gateway patterns, prompt management, and LLM reliability engineering. The Portkey playground allows testing different routing configurations and caching policies without writing code. Their integration with popular observability tools (Langfuse, Datadog) and the managed self-hosted deployment option make it the most professionally-oriented choice among the three.


Real-World Adoption

LiteLLM is widely used in AI research, startup prototyping, and infrastructure platforms that need to abstract over multiple LLM providers. Several open-source AI applications ship with LiteLLM built-in as the model router — allowing users to configure any provider without changing the application code. Enterprise teams in regulated industries use LiteLLM on-premise to ensure no customer data reaches external services.

OpenRouter is the default choice for developers building AI-powered tools who want to experiment with multiple models without managing multiple API keys and billing accounts. Products that let users choose their preferred AI model commonly use OpenRouter as the routing layer. Its presence of open-source and fine-tuned models not available through direct APIs (via partnerships with providers like Together AI and DeepInfra) makes it uniquely valuable for specialized use cases.

Portkey has been adopted by AI-first product teams at Series A and later-stage startups where LLM costs have become significant enough to warrant semantic caching, and where production reliability requires guardrails against prompt injection and toxic outputs. Companies using Portkey commonly report 30-50% reductions in LLM costs from semantic caching alone.


Developer Experience Deep Dive

LiteLLM's developer experience is excellent for Python developers and acceptable for JavaScript developers who connect via the proxy. The Python API is intuitive and the documentation covers every provider with working code examples. The proxy's OpenAI-compatible API means Node.js applications need zero changes to point at LiteLLM instead of OpenAI directly. The main friction is operational — self-hosting LiteLLM requires managing a Docker container, Redis for caching, and ensuring the proxy has high availability. LiteLLM does not provide a managed hosting option in its free tier.

OpenRouter's developer experience is the best for getting started. The API is OpenAI-compatible, the models page provides real-time pricing and availability information, and the keys page manages access tokens cleanly. The main limitation is observability — OpenRouter provides basic usage statistics but lacks the detailed request logging and cost attribution that production teams need. Debugging routing issues requires contacting support rather than inspecting your own logs.

Portkey's developer experience shines in the configuration management layer. The ability to define routing logic (fallbacks, load balancing, caching policies) in a dashboard configuration and reference it by ID means routing changes don't require code deployments. The prompt management feature — storing and versioning prompts in Portkey's system — is uniquely valuable for teams with many prompt variations across features. TypeScript types are complete and accurate.


Performance and Benchmarks

LiteLLM proxy adds approximately 10-20ms of overhead to each request — this is the round-trip to your self-hosted proxy server. For interactive use cases where LLM latency is already 500ms-5s, this is negligible. LiteLLM's load balancing distributes requests across multiple provider endpoints in parallel for retry scenarios, which can improve P99 latency compared to sequential fallback.

OpenRouter's infrastructure is distributed and achieves low routing overhead — typically under 50ms added to the raw provider latency. The primary latency driver is which model and provider you select, not OpenRouter's routing layer.

Portkey's semantic caching is the most significant performance feature among the three. Cache hits return in under 100ms (database lookup latency) compared to 500ms-5000ms for actual LLM inference. For applications with high query similarity — support chatbots, FAQ systems, documentation search — the semantic cache hit rate can reach 40-60%, dramatically reducing both latency and cost.


Migration Guide

Adding OpenRouter to an existing OpenAI integration: Change baseURL to https://openrouter.ai/api/v1 and replace the API key. Add the required HTTP-Referer header. Test with your existing model names (prepend openai/ if needed) and verify responses. The entire migration typically takes under an hour for a simple application.

Setting up LiteLLM proxy for a team: Deploy the Docker container with a PostgreSQL or SQLite database for virtual key management. Generate virtual keys for each team or service and configure budget limits. Update applications to point at the proxy URL instead of provider URLs. The main ongoing maintenance is keeping the LiteLLM Docker image updated (new provider support and bug fixes ship frequently).

Evaluating Portkey's semantic caching ROI: Enable semantic caching in Portkey with a similarity threshold of 0.95 (conservative) on your existing traffic for one week. Portkey's dashboard will show cache hit rate and cost savings. Most applications see 20-40% cost reduction within the first week.


Final Verdict 2026

For teams with zero infrastructure tolerance and a need to experiment across many models, OpenRouter is the clear first choice — sign up in minutes and access 200+ models without managing anything. For teams that need data sovereignty, custom routing logic, or open-source compliance, LiteLLM is the best self-hosted gateway. For production AI applications at companies where LLM costs are significant and production reliability is critical, Portkey provides the most complete enterprise feature set with semantic caching, guardrails, and managed configurations.


When to Use Each

Choose OpenRouter if:

  • You want zero infrastructure setup — one API key and you're done
  • You need access to obscure or experimental models not available via direct API
  • Cost-per-token routing across 200+ models without management overhead is your goal
  • You're building a prototype or side project

OpenRouter is uniquely positioned for model arbitrage: when you want to test which model gives the best cost-to-quality ratio for a specific task, OpenRouter's unified API means you can swap models with a one-line change. For AI agents that need to route different subtasks to different models (a lightweight model for classification, a large model for synthesis), OpenRouter's routing capabilities enable this without managing multiple API keys.

Choose LiteLLM if:

  • Data ownership and compliance require self-hosted infrastructure
  • You want to build on open-source without vendor lock-in
  • Your team uses Python (LiteLLM's native language) and you need a flexible proxy
  • You need advanced load balancing with per-team budget controls

LiteLLM's virtual key system is its most powerful enterprise feature. Each team or service gets a virtual key with configurable budget limits, and spend is tracked against the real provider cost. This makes LiteLLM genuinely useful for large engineering organizations where different product teams share LLM infrastructure — finance, customer support, and engineering can all have separate cost accountability. The LiteLLM proxy also integrates with standard observability tools (Langfuse, Helicone) for production monitoring.

Choose Portkey if:

  • You need semantic caching to reduce LLM costs at scale
  • Guardrails for PII detection and prompt injection are required
  • You want a managed enterprise gateway with SLA guarantees
  • Prompt management and versioning alongside routing matters

Portkey's semantic cache is its most defensible feature. Unlike simple exact-match caching, semantic caching returns cached results for queries that are semantically similar (not textually identical). For applications where users ask similar questions in different ways — customer support, internal knowledge search, AI tutoring — the cache hit rate is dramatically higher than exact-match caching. Teams at scale have reported 30-50% cost reductions from semantic caching alone, making Portkey's pricing self-financing in many production deployments.

LLM Gateway Architecture Patterns

The decision between managed and self-hosted gateways extends beyond the three tools compared here. The broader architecture pattern for production AI applications typically involves an LLM gateway layer between your application and provider APIs, regardless of which specific gateway you choose.

The gateway provides observability (request logging, cost tracking), reliability (fallback routing, retry logic), and optimization (caching, rate limit management). These concerns are common to all three tools — the differentiation is in implementation quality, self-hosting requirements, and enterprise feature depth.

For teams building LLM-powered applications with JavaScript, the AI SDK vs LangChain JavaScript 2026 comparison covers the application-layer orchestration tools that sit above the gateway layer. The gateway handles infrastructure concerns; the SDK handles prompt orchestration, tool calling, and streaming — these are complementary layers in the AI application stack.

Teams evaluating observability for their LLM infrastructure should also consider how gateway selection interacts with their monitoring stack. LiteLLM integrates natively with Langfuse and several other observability platforms, making it easy to correlate gateway-level metrics (latency, cost, cache hits) with application-level traces. Portkey has its own observability dashboard that provides similar visibility without requiring a separate tool.

The final consideration is team expertise. OpenRouter requires no infrastructure knowledge — it is a pure API service. LiteLLM requires the ability to deploy and maintain a Docker container, configure a database, and manage virtual keys — this is straightforward for most backend engineers but represents ongoing operational overhead. Portkey is a managed service with enterprise support, which shifts operational responsibility back to the vendor at the cost of a monthly subscription. Teams making the gateway decision should factor in not just the current cost and features, but the operational capacity they have available for maintaining the chosen tool over time. A self-hosted LiteLLM deployment that goes unmaintained accumulates security debt and compatibility issues as the underlying providers update their APIs.


Methodology

Data sourced from GitHub repositories (star counts as of February 2026), official documentation and pricing pages, community benchmarks on HuggingFace forums and AI engineering blogs. Pricing data for OpenRouter models verified from openrouter.ai/models (February 2026). LiteLLM latency estimates from official benchmarks and community reports. Feature availability verified against documentation.


Related:

Best AI LLM libraries for JavaScript 2026 for the broader JavaScript AI ecosystem.

Cloudflare Durable Objects vs Upstash vs Turso for edge data stores that pair with AI gateway deployments.

Best realtime libraries 2026 for streaming LLM responses to clients.

The 2026 JavaScript Stack Cheatsheet

One PDF: the best package for every category (ORMs, bundlers, auth, testing, state management). Used by 500+ devs. Free, updated monthly.