Skip to main content

Serverless Inference

Deploy state-of-the-art AI models instantly with Hyperbolic’s Serverless Inference platform. Access 25+ open-source models through a simple API, with no infrastructure to manage and pricing that scales with your usage.

Why Serverless Inference?

Skip the complexity of GPU management, model deployment, and infrastructure scaling. Focus on building your application while we handle the AI infrastructure.

Key Benefits

  • Instant Deployment: Start using models in seconds, not hours
  • No Infrastructure: Zero DevOps required - we handle everything
  • Pay Per Use: Only pay for the tokens/images you generate
  • OpenAI Compatible: Drop-in replacement for existing code
  • Privacy First: Zero data retention policy

Supported Model Categories

💬 Text Generation (LLMs)

Deploy the latest language models for chat, completion, and reasoning tasks. Available Models:
  • Llama 3.1 (8B, 70B, 405B) - Meta’s latest open models
  • Qwen 2.5 (7B, 72B) - Alibaba’s multilingual models
  • Deepseek V2.5 - Efficient reasoning model
  • Hermes 3 - Fine-tuned for conversations
  • Mistral 7B - Fast and efficient
Pricing: From $0.10 per million input tokens

🎨 Image Generation

Create stunning visuals with state-of-the-art diffusion models. Available Models:
  • Stable Diffusion XL - High-quality 1024x1024 images
  • Stable Diffusion 3.5 - Latest generation
  • FLUX.1 [schnell/dev] - Ultra-fast generation
  • ControlNet - Guided image generation
  • Custom LoRA - Use your fine-tuned models
Pricing: From $0.0025 per image

🎯 Vision-Language Models

Process and understand images with multimodal models. Available Models:
  • Llama 3.2 Vision (11B, 90B) - Image understanding
  • Qwen2-VL (2B, 7B) - Multimodal reasoning
Pricing: From $0.15 per million tokens

🔊 Audio Generation

Generate natural-sounding speech and process audio. Available Models:
  • Melo TTS - Text-to-speech generation
  • Whisper - Speech-to-text transcription (coming soon)
Pricing: From $0.001 per 1000 characters

Pricing Tiers

TierRPM LimitIP LimitMin. DepositFeatures
Basic60100$0API access, All models, Community support
Pro600100≥ $510x rate limits, Priority queue, Email support
EnterpriseUnlimitedUnlimitedContact SalesCustom limits, Dedicated instances, SLA guarantees, Fine-tuning, 24/7 support
Note: Each source IP is capped at 600 RPM for DDoS protection. Need higher limits? Contact sales.

Developer Experience

OpenAI SDK Compatibility

Switch from OpenAI with just 2 lines of code:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_HYPERBOLIC_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

# Your existing code works unchanged
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

REST API

Direct HTTP access for any platform:
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Support

Real-time token streaming for chat applications:
stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Advanced Features

🎯 Function Calling

Enable models to call external tools and APIs:
  • Structured output generation
  • Tool integration for agents
  • JSON schema validation

🔧 Custom Parameters

Fine-tune model behavior:
  • Temperature, top_p, top_k controls
  • Max tokens and stop sequences
  • Presence and frequency penalties
  • Custom system prompts

📊 Structured Output

Get reliable JSON responses:
  • JSON mode for consistent formatting
  • Schema enforcement
  • Type validation

🚀 Batch Processing

Optimize for throughput:
  • Batch multiple requests
  • Async processing
  • Bulk pricing discounts

Use Cases

Chatbots & Assistants

Build conversational AI with streaming responses and context management.

Content Generation

Create articles, summaries, and creative writing at scale.

Code Generation

Generate, explain, and debug code across multiple languages.

Image Creation

Design assets, generate product images, and create visual content.

Data Processing

Extract insights, classify text, and analyze sentiment.

Translation

Translate content across 100+ languages with context preservation.

Getting Started

Quick Start in 3 Steps

1. Get Your API Key

Sign up at app.hyperbolic.ai and generate an API key

2. Install SDK

pip install openai
# or
npm install openai

3. Make Your First Request

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
)

print(response.choices[0].message.content)

Integration Examples

LangChain Integration

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_key="YOUR_API_KEY",
    openai_api_base="https://api.hyperbolic.xyz/v1",
    model_name="meta-llama/Meta-Llama-3.1-70B-Instruct"
)

Vercel AI SDK

import { OpenAI } from 'openai';

const openai = new OpenAI({
  apiKey: process.env.HYPERBOLIC_API_KEY,
  baseURL: 'https://api.hyperbolic.xyz/v1'
});

Gradio Interface

Deploy interactive demos with one-click Hugging Face Spaces integration.

Reliability & Compliance

Infrastructure

  • 99.9% Uptime SLA for Enterprise tier
  • Global CDN for low-latency access
  • Auto-scaling to handle traffic spikes
  • Multi-region deployment

Security

  • Zero Data Retention: Your data is never stored
  • Encrypted Connections: TLS 1.3 for all API calls
  • API Key Rotation: Regular key management
  • SOC2 Compliance: Enterprise-grade security

Support

  • Documentation: Comprehensive guides and examples
  • Community Discord: Active developer community
  • Email Support: Pro tier and above
  • 24/7 Support: Enterprise tier

Resources

Pricing Calculator

Estimate your costs based on usage:
Usage LevelTokens/MonthEstimated Cost
Hobby1M tokens~$0.15
Startup10M tokens~$1.50
Growth100M tokens~$12.00
Scale1B tokens~$100.00
Based on Llama 3.1 70B pricing. Actual costs vary by model.

Next Steps

Ready to build? Start with $5 free credits to explore our models. Get Your API Key →
Migration Support Moving from OpenAI, Anthropic, or another provider? Our team can help with migration strategies and code conversion. Contact us →