Serverless Inference
Deploy state-of-the-art AI models instantly with Hyperbolic’s Serverless Inference platform. Access 25+ open-source models through a simple API, with no infrastructure to manage and pricing that scales with your usage.Why Serverless Inference?
Skip the complexity of GPU management, model deployment, and infrastructure scaling. Focus on building your application while we handle the AI infrastructure.Key Benefits
- Instant Deployment: Start using models in seconds, not hours
- No Infrastructure: Zero DevOps required - we handle everything
- Pay Per Use: Only pay for the tokens/images you generate
- OpenAI Compatible: Drop-in replacement for existing code
- Privacy First: Zero data retention policy
Supported Model Categories
💬 Text Generation (LLMs)
Deploy the latest language models for chat, completion, and reasoning tasks. Available Models:- Llama 3.1 (8B, 70B, 405B) - Meta’s latest open models
- Qwen 2.5 (7B, 72B) - Alibaba’s multilingual models
- Deepseek V2.5 - Efficient reasoning model
- Hermes 3 - Fine-tuned for conversations
- Mistral 7B - Fast and efficient
🎨 Image Generation
Create stunning visuals with state-of-the-art diffusion models. Available Models:- Stable Diffusion XL - High-quality 1024x1024 images
- Stable Diffusion 3.5 - Latest generation
- FLUX.1 [schnell/dev] - Ultra-fast generation
- ControlNet - Guided image generation
- Custom LoRA - Use your fine-tuned models
🎯 Vision-Language Models
Process and understand images with multimodal models. Available Models:- Llama 3.2 Vision (11B, 90B) - Image understanding
- Qwen2-VL (2B, 7B) - Multimodal reasoning
🔊 Audio Generation
Generate natural-sounding speech and process audio. Available Models:- Melo TTS - Text-to-speech generation
- Whisper - Speech-to-text transcription (coming soon)
Pricing Tiers
| Tier | RPM Limit | IP Limit | Min. Deposit | Features |
|---|---|---|---|---|
| Basic | 60 | 100 | $0 | API access, All models, Community support |
| Pro | 600 | 100 | ≥ $5 | 10x rate limits, Priority queue, Email support |
| Enterprise | Unlimited | Unlimited | Contact Sales | Custom limits, Dedicated instances, SLA guarantees, Fine-tuning, 24/7 support |
Note: Each source IP is capped at 600 RPM for DDoS protection. Need higher limits? Contact sales.
Developer Experience
OpenAI SDK Compatibility
Switch from OpenAI with just 2 lines of code:REST API
Direct HTTP access for any platform:Streaming Support
Real-time token streaming for chat applications:Advanced Features
🎯 Function Calling
Enable models to call external tools and APIs:- Structured output generation
- Tool integration for agents
- JSON schema validation
🔧 Custom Parameters
Fine-tune model behavior:- Temperature, top_p, top_k controls
- Max tokens and stop sequences
- Presence and frequency penalties
- Custom system prompts
📊 Structured Output
Get reliable JSON responses:- JSON mode for consistent formatting
- Schema enforcement
- Type validation
🚀 Batch Processing
Optimize for throughput:- Batch multiple requests
- Async processing
- Bulk pricing discounts
Use Cases
Chatbots & Assistants
Build conversational AI with streaming responses and context management.Content Generation
Create articles, summaries, and creative writing at scale.Code Generation
Generate, explain, and debug code across multiple languages.Image Creation
Design assets, generate product images, and create visual content.Data Processing
Extract insights, classify text, and analyze sentiment.Translation
Translate content across 100+ languages with context preservation.Getting Started
Quick Start in 3 Steps
1. Get Your API Key
Sign up at app.hyperbolic.ai and generate an API key2. Install SDK
3. Make Your First Request
Integration Examples
LangChain Integration
Vercel AI SDK
Gradio Interface
Deploy interactive demos with one-click Hugging Face Spaces integration.Reliability & Compliance
Infrastructure
- 99.9% Uptime SLA for Enterprise tier
- Global CDN for low-latency access
- Auto-scaling to handle traffic spikes
- Multi-region deployment
Security
- Zero Data Retention: Your data is never stored
- Encrypted Connections: TLS 1.3 for all API calls
- API Key Rotation: Regular key management
- SOC2 Compliance: Enterprise-grade security
Support
- Documentation: Comprehensive guides and examples
- Community Discord: Active developer community
- Email Support: Pro tier and above
- 24/7 Support: Enterprise tier
Resources
- 🎮 Playground - Test models before coding
- 📖 Text APIs - Text generation models
- 💡 Image APIs - Image generation models
Pricing Calculator
Estimate your costs based on usage:| Usage Level | Tokens/Month | Estimated Cost |
|---|---|---|
| Hobby | 1M tokens | ~$0.15 |
| Startup | 10M tokens | ~$1.50 |
| Growth | 100M tokens | ~$12.00 |
| Scale | 1B tokens | ~$100.00 |
Next Steps
Ready to build? Start with $5 free credits to explore our models. Get Your API Key →Migration Support Moving from OpenAI, Anthropic, or another provider? Our team can help with migration strategies and code conversion. Contact us →

