Performance and Limits

Understand rate limits, service tiers, and infrastructure capabilities for Hyperbolic’s Serverless Inference API.

Rate Limits

Standard Limits

Tier	Requests/Minute	Requirements
Basic	60	Free account
Pro	600	$5+ deposit
Enterprise	Unlimited	Contact sales

All tiers have a per-IP limit of 600 requests/minute for DDoS protection.

Model-Specific Limits

Some resource-intensive models have special rate limits:

Model	Basic	Pro
Llama 3.1 405B	5/min	120/min
Llama 3.1 405B-Instruct	5/min	120/min
FLUX.1-dev	1/5 min	50/min

Upgrading to Pro

Get 10x higher rate limits by upgrading to Pro:

Log into Dashboard

Go to app.hyperbolic.ai and sign in.

Add Funds

Deposit $5 or more to your account.

Automatic Upgrade

Your account is automatically upgraded to Pro tier.

Service Tiers

Feature	Basic	Pro	Enterprise
Rate Limit	60/min	600/min	Unlimited
Cost	Free	$5+ deposit	Custom
Support	Community Discord	Email	24/7 dedicated
Priority Queue	-	Yes	Yes
Dedicated Instances	-	-	Yes
Custom SLAs	-	-	Yes
Fine-tuning	-	-	Yes

Basic tier includes $1 promotional credit when you verify your phone number.

Need higher limits or dedicated infrastructure? Contact sales

Pricing Summary

Hyperbolic uses pay-as-you-go pricing with no monthly quotas or commitments.

Text Generation

Model Category	Price
Small models (3B-8B)	From $0.10 per 1M tokens
Medium models (32B-72B)	$0.20 - $0.40 per 1M tokens
Large models (120B-480B)	$0.30 - $4.00 per 1M tokens

Image Generation

Base rate: $0.01 per image (1024x1024, 25 steps) Formula: $0.01 × (width/1024) × (height/1024) × (steps/25)

Audio Generation

Rate: $5.00 per 1M characters

See Text APIs, Image APIs, and Audio APIs for complete pricing by model.

Infrastructure

Security

Feature	Description
Zero Data Retention	Your prompts and responses are never stored
Encryption	TLS 1.3 for all API connections
Compliance	SOC2 compliance (Enterprise tier)

Error Handling

Rate Limit Errors

When you exceed rate limits, you’ll receive a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after X seconds."
  }
}

Best Practices

Implement exponential backoff for automatic retries
Monitor usage via the dashboard to stay within limits
Cache responses when appropriate to reduce API calls
Use streaming for long responses to improve perceived latency

Retry Example

import time
from openai import RateLimitError

def call_with_retry(func, max_retries=3):
    """Call a function with exponential backoff on rate limit errors."""
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # 1, 2, 4 seconds
            time.sleep(wait_time)
    raise Exception("Max retries exceeded")

# Usage
response = call_with_retry(
    lambda: client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct",
        messages=[{"role": "user", "content": "Hello!"}]
    )
)

Monitoring Usage

Track your API usage in the Hyperbolic Dashboard:

Requests per minute/hour/day
Token consumption by model
Cost breakdown and billing history
Real-time usage graphs

Next Steps

Text APIs

Models and pricing details

Image APIs

Image generation pricing

Audio APIs

Text-to-speech pricing

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

Performance and Limits

Performance and Limits

Rate Limits

Standard Limits

Model-Specific Limits

Upgrading to Pro

Service Tiers

Pricing Summary

Text Generation

Image Generation

Audio Generation

Infrastructure

Security

Error Handling

Rate Limit Errors

Best Practices

Retry Example

Monitoring Usage

Next Steps

Text APIs

Image APIs

Audio APIs

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

​Performance and Limits

​Rate Limits

​Standard Limits

​Model-Specific Limits

​Upgrading to Pro

​Service Tiers

​Pricing Summary

​Text Generation

​Image Generation

​Audio Generation

​Infrastructure

​Security

​Error Handling

​Rate Limit Errors

​Best Practices

​Retry Example

​Monitoring Usage

​Next Steps

Text APIs

Image APIs

Audio APIs

Performance and Limits

Rate Limits

Standard Limits

Model-Specific Limits

Upgrading to Pro

Service Tiers

Pricing Summary

Text Generation

Image Generation

Audio Generation

Infrastructure

Security

Error Handling

Rate Limit Errors

Best Practices

Retry Example

Monitoring Usage

Next Steps