Performance and Limits
Understand rate limits, service tiers, and infrastructure capabilities for Hyperbolic’s Serverless Inference API.Rate Limits
Standard Limits
| Tier | Requests/Minute | Requirements |
|---|---|---|
| Basic | 60 | Free account |
| Pro | 600 | $5+ deposit |
| Enterprise | Unlimited | Contact sales |
All tiers have a per-IP limit of 600 requests/minute for DDoS protection.
Model-Specific Limits
Some resource-intensive models have special rate limits:| Model | Basic | Pro |
|---|---|---|
| Llama 3.1 405B | 5/min | 120/min |
| Llama 3.1 405B-Instruct | 5/min | 120/min |
| FLUX.1-dev | 1/5 min | 50/min |
Upgrading to Pro
Get 10x higher rate limits by upgrading to Pro:1
Log into Dashboard
Go to app.hyperbolic.ai and sign in.
2
Add Funds
Deposit $5 or more to your account.
3
Automatic Upgrade
Your account is automatically upgraded to Pro tier.
Service Tiers
| Feature | Basic | Pro | Enterprise |
|---|---|---|---|
| Rate Limit | 60/min | 600/min | Unlimited |
| Cost | Free | $5+ deposit | Custom |
| Support | Community Discord | 24/7 dedicated | |
| Priority Queue | - | Yes | Yes |
| Dedicated Instances | - | - | Yes |
| Custom SLAs | - | - | Yes |
| Fine-tuning | - | - | Yes |
Basic tier includes $1 promotional credit when you verify your phone number.
Pricing Summary
Hyperbolic uses pay-as-you-go pricing with no monthly quotas or commitments.Text Generation
| Model Category | Price |
|---|---|
| Small models (3B-8B) | From $0.10 per 1M tokens |
| Medium models (32B-72B) | $0.20 - $0.40 per 1M tokens |
| Large models (120B-480B) | $0.30 - $4.00 per 1M tokens |
Image Generation
Base rate: $0.01 per image (1024x1024, 25 steps) Formula:$0.01 × (width/1024) × (height/1024) × (steps/25)
Audio Generation
Rate: $5.00 per 1M charactersSee Text APIs, Image APIs, and Audio APIs for complete pricing by model.
Infrastructure
Security
| Feature | Description |
|---|---|
| Zero Data Retention | Your prompts and responses are never stored |
| Encryption | TLS 1.3 for all API connections |
| Compliance | SOC2 compliance (Enterprise tier) |
Error Handling
Rate Limit Errors
When you exceed rate limits, you’ll receive a429 Too Many Requests response:
Best Practices
- Implement exponential backoff for automatic retries
- Monitor usage via the dashboard to stay within limits
- Cache responses when appropriate to reduce API calls
- Use streaming for long responses to improve perceived latency
Retry Example
Monitoring Usage
Track your API usage in the Hyperbolic Dashboard:- Requests per minute/hour/day
- Token consumption by model
- Cost breakdown and billing history
- Real-time usage graphs

