Serverless Inference

Deploy state-of-the-art AI models instantly with Hyperbolic’s Serverless Inference platform. Access 25+ open-source models through a simple API, with no infrastructure to manage and pricing that scales with your usage.

Why Serverless Inference?

Skip the complexity of GPU management, model deployment, and infrastructure scaling. Focus on building your application while we handle the AI infrastructure.

Key Benefits

Instant Deployment: Start using models in seconds, not hours
No Infrastructure: Zero DevOps required - we handle everything
Pay Per Use: Only pay for the tokens/images you generate
OpenAI Compatible: Drop-in replacement for existing code
Privacy First: Zero data retention policy

Supported Model Categories

💬 Text Generation (LLMs)

Deploy the latest language models for chat, completion, and reasoning tasks. Available Models:

Llama 3.1 (8B, 70B, 405B) - Meta’s latest open models
Qwen 2.5 (7B, 72B) - Alibaba’s multilingual models
Deepseek V2.5 - Efficient reasoning model
Hermes 3 - Fine-tuned for conversations
Mistral 7B - Fast and efficient

Pricing: From $0.10 per million input tokens

🎨 Image Generation

Create stunning visuals with state-of-the-art diffusion models. Available Models:

Stable Diffusion XL - High-quality 1024x1024 images
Stable Diffusion 3.5 - Latest generation
FLUX.1 [schnell/dev] - Ultra-fast generation
ControlNet - Guided image generation
Custom LoRA - Use your fine-tuned models

Pricing: From $0.0025 per image

🎯 Vision-Language Models

Process and understand images with multimodal models. Available Models:

Llama 3.2 Vision (11B, 90B) - Image understanding
Qwen2-VL (2B, 7B) - Multimodal reasoning

Pricing: From $0.15 per million tokens

🔊 Audio Generation

Generate natural-sounding speech and process audio. Available Models:

Melo TTS - Text-to-speech generation
Whisper - Speech-to-text transcription (coming soon)

Pricing: From $0.001 per 1000 characters

Pricing Tiers

Tier	RPM Limit	IP Limit	Min. Deposit	Features
Basic	60	100	$0	API access, All models, Community support
Pro	600	100	≥ $5	10x rate limits, Priority queue, Email support
Enterprise	Unlimited	Unlimited	Contact Sales	Custom limits, Dedicated instances, SLA guarantees, Fine-tuning, 24/7 support

Note: Each source IP is capped at 600 RPM for DDoS protection. Need higher limits? Contact sales.

Developer Experience

OpenAI SDK Compatibility

Switch from OpenAI with just 2 lines of code:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_HYPERBOLIC_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

# Your existing code works unchanged
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

REST API

Direct HTTP access for any platform:

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Support

Real-time token streaming for chat applications:

stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Advanced Features

🎯 Function Calling

Enable models to call external tools and APIs. Supported on 18+ models including DeepSeek, Llama, Qwen, Kimi, and GPT-OSS families. See full list →

Structured output generation
Tool integration for agents
JSON schema validation

🔧 Custom Parameters

Fine-tune model behavior:

Temperature, top_p, top_k controls
Max tokens and stop sequences
Presence and frequency penalties
Custom system prompts

📊 Structured Output

Get reliable JSON responses:

JSON mode for consistent formatting
Schema enforcement
Type validation

🚀 Batch Processing

Optimize for throughput:

Batch multiple requests
Async processing
Bulk pricing discounts

Use Cases

Chatbots & Assistants

Build conversational AI with streaming responses and context management.

Content Generation

Create articles, summaries, and creative writing at scale.

Code Generation

Generate, explain, and debug code across multiple languages.

Image Creation

Design assets, generate product images, and create visual content.

Data Processing

Extract insights, classify text, and analyze sentiment.

Translation

Translate content across 100+ languages with context preservation.

Getting Started

Quick Start in 3 Steps

1. Get Your API Key

2. Install SDK

pip install openai
# or
npm install openai

3. Make Your First Request

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
)

print(response.choices[0].message.content)

Integration Examples

LangChain Integration

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_key="YOUR_API_KEY",
    openai_api_base="https://api.hyperbolic.xyz/v1",
    model_name="meta-llama/Meta-Llama-3.1-70B-Instruct"
)

Vercel AI SDK

import { OpenAI } from 'openai';

const openai = new OpenAI({
  apiKey: process.env.HYPERBOLIC_API_KEY,
  baseURL: 'https://api.hyperbolic.xyz/v1'
});

Gradio Interface

Deploy interactive demos with one-click Hugging Face Spaces integration.

Reliability & Compliance

Infrastructure

99.9% Uptime SLA for Enterprise tier
Global CDN for low-latency access
Auto-scaling to handle traffic spikes
Multi-region deployment

Security

Zero Data Retention: Your data is never stored
Encrypted Connections: TLS 1.3 for all API calls
API Key Rotation: Regular key management
SOC2 Compliance: Enterprise-grade security

Support

Documentation: Comprehensive guides and examples
Community Discord: Active developer community
Email Support: Pro tier and above
24/7 Support: Enterprise tier

Resources

🎮 Playground - Test models before coding
📖 Text APIs - Text generation models
💡 Image APIs - Image generation models

Pricing Calculator

Estimate your costs based on usage:

Usage Level	Tokens/Month	Estimated Cost
Hobby	1M tokens	~$0.15
Startup	10M tokens	~$1.50
Growth	100M tokens	~$12.00
Scale	1B tokens	~$100.00

Based on Llama 3.1 70B pricing. Actual costs vary by model.

Next Steps

Ready to build? Start with $5 free credits to explore our models. Get Your API Key →

Migration Support Moving from OpenAI, Anthropic, or another provider? Our team can help with migration strategies and code conversion. Contact us →

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

​Serverless Inference

​Why Serverless Inference?

​Key Benefits

​Supported Model Categories

​💬 Text Generation (LLMs)

​🎨 Image Generation

​🎯 Vision-Language Models

​🔊 Audio Generation

​Pricing Tiers

​Developer Experience

​OpenAI SDK Compatibility

​REST API

​Streaming Support

​Advanced Features

​🎯 Function Calling

​🔧 Custom Parameters

​📊 Structured Output

​🚀 Batch Processing

​Use Cases

​Chatbots & Assistants

​Content Generation

​Code Generation

​Image Creation

​Data Processing

​Translation

​Getting Started

​Quick Start in 3 Steps

​1. Get Your API Key

​2. Install SDK

​3. Make Your First Request

​Integration Examples

​LangChain Integration

​Vercel AI SDK

​Gradio Interface

​Reliability & Compliance

​Infrastructure

​Security

​Support

​Resources

​Pricing Calculator

​Next Steps

Serverless Inference

Why Serverless Inference?

Key Benefits

Supported Model Categories

💬 Text Generation (LLMs)

🎨 Image Generation

🎯 Vision-Language Models

🔊 Audio Generation

Pricing Tiers

Developer Experience

OpenAI SDK Compatibility

REST API

Streaming Support

Advanced Features

🎯 Function Calling

🔧 Custom Parameters

📊 Structured Output

🚀 Batch Processing

Use Cases

Chatbots & Assistants

Content Generation

Code Generation

Image Creation

Data Processing

Translation

Getting Started

Quick Start in 3 Steps

1. Get Your API Key

2. Install SDK

3. Make Your First Request

Integration Examples

LangChain Integration

Vercel AI SDK

Gradio Interface

Reliability & Compliance

Infrastructure

Security

Support

Resources

Pricing Calculator

Next Steps