Text Generation APIs
Access 20+ large language models through Hyperbolic’s OpenAI-compatible chat completions API. Generate text, have conversations, write code, and more with state-of-the-art open-source models.
Chat Completions
The chat completions endpoint is the primary way to interact with text generation models.
Endpoint
POST https://api.hyperbolic.xyz/v1/chat/completions
Basic Request
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 512,
"temperature": 0.7
}
response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 512,
"temperature": 0.7
}'
Request Parameters
| Parameter | Type | Required | Description |
|---|
model | string | Yes | Model ID (e.g., deepseek-ai/DeepSeek-R1) |
messages | array | Yes | Array of message objects with role and content |
max_tokens | integer | No | Maximum number of tokens to generate |
temperature | float | No | Sampling temperature (0-2). Higher = more creative |
top_p | float | No | Nucleus sampling threshold (0-1) |
stream | boolean | No | Enable streaming responses (default: false) |
stop | string/array | No | Stop sequence(s) to end generation |
Messages are an array of objects with role and content fields:
{
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string."},
{"role": "assistant", "content": "Here's a function to reverse a string..."},
{"role": "user", "content": "Can you make it more efficient?"}
]
}
Roles:
system: Sets the behavior and context for the assistant
user: Messages from the user
assistant: Previous responses from the model (for conversation history)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "deepseek-ai/DeepSeek-R1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165
}
}
Multi-turn Conversations
To maintain conversation context, include the full message history in each request:
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
# Maintain conversation history
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
]
response = requests.post(url, headers=headers, json={
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": messages,
"max_tokens": 256
})
print(response.json()["choices"][0]["message"]["content"])
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
],
"max_tokens": 256
}'
Streaming Responses
Enable real-time token streaming for responsive chat applications:
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "Write a short story about a robot."}
],
"max_tokens": 512,
"stream": True
}
response = requests.post(url, headers=headers, json=data, stream=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: ') and line != 'data: [DONE]':
import json
chunk = json.loads(line[6:])
content = chunk['choices'][0]['delta'].get('content', '')
print(content, end='', flush=True)
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "Write a short story about a robot."}
],
"max_tokens": 512,
"stream": true
}'
For more advanced streaming patterns including function calling, see Advanced Features.
System Prompts
Use system prompts to set the model’s behavior, persona, or instructions:
messages = [
{
"role": "system",
"content": """You are an expert Python developer. Follow these guidelines:
- Write clean, well-documented code
- Include type hints
- Add brief comments explaining complex logic
- Suggest improvements when relevant"""
},
{"role": "user", "content": "Write a function to find prime numbers."}
]
Text Completions (Base Models)
For raw text completion without chat formatting, use the completions endpoint with base models like Llama 3.1 405B BASE.
Endpoint
POST https://api.hyperbolic.xyz/v1/completions
Example
import requests
url = "https://api.hyperbolic.xyz/v1/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "meta-llama/Meta-Llama-3.1-405B",
"prompt": "The key principles of machine learning are",
"max_tokens": 256,
"temperature": 0.7
}
response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["text"])
curl -X POST "https://api.hyperbolic.xyz/v1/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Meta-Llama-3.1-405B",
"prompt": "The key principles of machine learning are",
"max_tokens": 256,
"temperature": 0.7
}'
Base models are ideal for text completion, fill-in-the-middle tasks, and custom prompting strategies where chat formatting isn’t needed.
Available Models
Instruct Models (Chat Completions)
| Model | Model ID | Context | Price |
|---|
| DeepSeek-R1 | deepseek-ai/DeepSeek-R1 | 131K | $2.00/M tokens |
| DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 | 164K | $3.00/M tokens |
| DeepSeek-V3 | deepseek-ai/DeepSeek-V3 | 131K | $0.25/M tokens |
| DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 | 131K | $1.25/M tokens |
| GPT-OSS 120B | openai/gpt-oss-120b | 131K | $0.30/M tokens |
| GPT-OSS 20B | openai/gpt-oss-20b | 131K | $0.10/M tokens |
| Kimi-K2 | moonshotai/Kimi-K2-Instruct | 131K | $2.00/M tokens |
| Llama 3 70B | meta-llama/Meta-Llama-3-70B-Instruct | 131K | $0.40/M tokens |
| Llama 3.1 405B | meta-llama/Meta-Llama-3.1-405B-Instruct | 131K | $4.00/M tokens |
| Llama 3.1 70B | meta-llama/Meta-Llama-3.1-70B-Instruct | 131K | $0.40/M tokens |
| Llama 3.1 8B | meta-llama/Meta-Llama-3.1-8B-Instruct | 131K | $0.10/M tokens |
| Llama 3.2 3B | meta-llama/Llama-3.2-3B-Instruct | 131K | $0.10/M tokens |
| Llama 3.3 70B | meta-llama/Llama-3.3-70B-Instruct | 131K | $0.40/M tokens |
| Qwen 2.5 72B | Qwen/Qwen2.5-72B-Instruct | 131K | $0.40/M tokens |
| Qwen 2.5 Coder 32B | Qwen/Qwen2.5-Coder-32B-Instruct | 131K | $0.20/M tokens |
| Qwen3-235B-A22B | Qwen/Qwen3-235B-A22B | 41K | $0.40/M tokens |
| Qwen3-235B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 262K | $2.00/M tokens |
| Qwen3-Coder 480B | Qwen/Qwen3-Coder-480B-A35B-Instruct | 262K | $2.00/M tokens |
| Qwen3-Next 80B Instruct | Qwen/Qwen3-Next-80B-A3B-Instruct | 262K | $0.30/M tokens |
| Qwen3-Next 80B Thinking | Qwen/Qwen3-Next-80B-A3B-Thinking | 262K | $0.30/M tokens |
| QwQ-32B | Qwen/QwQ-32B | 131K | $0.40/M tokens |
Base Models (Text Completions)
| Model | Model ID | Context | Price |
|---|
| Llama 3.1 405B BASE | meta-llama/Meta-Llama-3.1-405B | 131K | $4.00/M tokens |
OpenAI SDK Compatibility
The API is fully compatible with OpenAI’s SDK. Just change the base URL and API key:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_HYPERBOLIC_API_KEY",
base_url="https://api.hyperbolic.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'YOUR_HYPERBOLIC_API_KEY',
baseURL: 'https://api.hyperbolic.xyz/v1'
});
const response = await client.chat.completions.create({
model: 'meta-llama/Llama-3.3-70B-Instruct',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);
Error Handling
| Error Code | Description | Solution |
|---|
401 | Invalid or missing API key | Check your API key is correct and included in the Authorization header |
400 | Invalid request parameters | Verify model ID, message format, and parameter values |
429 | Rate limit exceeded | Reduce request frequency or upgrade to Pro tier |
500 | Server error | Retry the request; contact support if persistent |
Basic tier allows 60 requests/minute. Upgrade to Pro tier (minimum $5 deposit) for 600 requests/minute.
Next Steps