Text Generation APIs
Access 20+ large language models through Hyperbolic’s OpenAI-compatible chat completions API. Generate text, have conversations, write code, and more with state-of-the-art open-source models.
Chat Completions
The chat completions endpoint is the primary way to interact with text generation models.
Endpoint
POST https://api.hyperbolic.xyz/v1/chat/completions
Basic Request
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 512,
"temperature": 0.7
}
response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 512,
"temperature": 0.7
}'
Request Parameters
| Parameter | Type | Required | Description |
|---|
model | string | Yes | Model ID (e.g., deepseek-ai/DeepSeek-R1) |
messages | array | Yes | Array of message objects with role and content |
max_tokens | integer | No | Maximum number of tokens to generate |
temperature | float | No | Sampling temperature (0-2). Higher = more creative |
top_p | float | No | Nucleus sampling threshold (0-1) |
stream | boolean | No | Enable streaming responses (default: false) |
stop | string/array | No | Stop sequence(s) to end generation |
Messages are an array of objects with role and content fields:
{
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string."},
{"role": "assistant", "content": "Here's a function to reverse a string..."},
{"role": "user", "content": "Can you make it more efficient?"}
]
}
Roles:
system: Sets the behavior and context for the assistant
user: Messages from the user
assistant: Previous responses from the model (for conversation history)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "deepseek-ai/DeepSeek-R1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165
}
}
Multi-turn Conversations
To maintain conversation context, include the full message history in each request:
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
# Maintain conversation history
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
]
response = requests.post(url, headers=headers, json={
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": messages,
"max_tokens": 256
})
print(response.json()["choices"][0]["message"]["content"])
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
],
"max_tokens": 256
}'
Streaming Responses
Enable real-time token streaming for responsive chat applications:
import requests
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "Write a short story about a robot."}
],
"max_tokens": 512,
"stream": True
}
response = requests.post(url, headers=headers, json=data, stream=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: ') and line != 'data: [DONE]':
import json
chunk = json.loads(line[6:])
content = chunk['choices'][0]['delta'].get('content', '')
print(content, end='', flush=True)
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "Write a short story about a robot."}
],
"max_tokens": 512,
"stream": true
}'
For tool calling (function calling) capabilities, see the Tool Calling section below.
System Prompts
Use system prompts to set the model’s behavior, persona, or instructions:
messages = [
{
"role": "system",
"content": """You are an expert Python developer. Follow these guidelines:
- Write clean, well-documented code
- Include type hints
- Add brief comments explaining complex logic
- Suggest improvements when relevant"""
},
{"role": "user", "content": "Write a function to find prime numbers."}
]
Tool calling (also known as function calling) allows models to invoke external tools and APIs. The model generates structured JSON arguments that your application can use to call functions, then the results can be passed back to continue the conversation.
Supported Models
The following models support tool calling:
deepseek-ai/DeepSeek-R1-0528
deepseek-ai/DeepSeek-R1
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-V3-0324
meta-llama/Meta-Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.3-70B-Instruct
moonshotai/Kimi-K2-Instruct
openai/gpt-oss-120b
openai/gpt-oss-20b
Qwen/Qwen2.5-72B-Instruct
Qwen/Qwen3-235B-A22B
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen/Qwen3-Coder-480B-A35B-Instruct
Qwen/Qwen3-Next-80B-A3B-Instruct
Qwen/Qwen3-Next-80B-A3B-Thinking
Example
import requests
import json
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
]
data = {
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "What's the weather like in San Francisco?"}
],
"tools": tools,
"tool_choice": "auto"
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
# Check if the model wants to call a tool
message = result["choices"][0]["message"]
if message.get("tool_calls"):
tool_call = message["tool_calls"][0]
print(f"Function: {tool_call['function']['name']}")
print(f"Arguments: {tool_call['function']['arguments']}")
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "user", "content": "What'\''s the weather like in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
The tool_choice parameter controls how the model uses tools: "auto" lets the model decide, "none" disables tool use, or specify a tool name to force its use.
Text Completions (Base Models)
For raw text completion without chat formatting, use the completions endpoint with base models like Llama 3.1 405B BASE.
Endpoint
POST https://api.hyperbolic.xyz/v1/completions
Example
import requests
url = "https://api.hyperbolic.xyz/v1/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"model": "meta-llama/Meta-Llama-3.1-405B",
"prompt": "The key principles of machine learning are",
"max_tokens": 256,
"temperature": 0.7
}
response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["text"])
curl -X POST "https://api.hyperbolic.xyz/v1/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Meta-Llama-3.1-405B",
"prompt": "The key principles of machine learning are",
"max_tokens": 256,
"temperature": 0.7
}'
Base models are ideal for text completion, fill-in-the-middle tasks, and custom prompting strategies where chat formatting isn’t needed.
Available Models
Instruct Models (Chat Completions)
| Model | Model ID | Context | Price | Tools |
|---|
| DeepSeek-R1 | deepseek-ai/DeepSeek-R1 | 131K | $2.00/M tokens | Yes |
| DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 | 164K | $3.00/M tokens | Yes |
| DeepSeek-V3 | deepseek-ai/DeepSeek-V3 | 131K | $0.25/M tokens | Yes |
| DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 | 131K | $1.25/M tokens | Yes |
| GPT-OSS 120B | openai/gpt-oss-120b | 131K | $0.30/M tokens | Yes |
| GPT-OSS 20B | openai/gpt-oss-20b | 131K | $0.10/M tokens | Yes |
| Kimi-K2 | moonshotai/Kimi-K2-Instruct | 131K | $2.00/M tokens | Yes |
| Llama 3 70B | meta-llama/Meta-Llama-3-70B-Instruct | 131K | $0.40/M tokens | - |
| Llama 3.1 405B | meta-llama/Meta-Llama-3.1-405B-Instruct | 131K | $4.00/M tokens | Yes |
| Llama 3.1 70B | meta-llama/Meta-Llama-3.1-70B-Instruct | 131K | $0.40/M tokens | Yes |
| Llama 3.1 8B | meta-llama/Meta-Llama-3.1-8B-Instruct | 131K | $0.10/M tokens | Yes |
| Llama 3.2 3B | meta-llama/Llama-3.2-3B-Instruct | 131K | $0.10/M tokens | Yes |
| Llama 3.3 70B | meta-llama/Llama-3.3-70B-Instruct | 131K | $0.40/M tokens | Yes |
| Qwen 2.5 72B | Qwen/Qwen2.5-72B-Instruct | 131K | $0.40/M tokens | Yes |
| Qwen 2.5 Coder 32B | Qwen/Qwen2.5-Coder-32B-Instruct | 131K | $0.20/M tokens | - |
| Qwen3-235B-A22B | Qwen/Qwen3-235B-A22B | 41K | $0.40/M tokens | Yes |
| Qwen3-235B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 262K | $0.25/M tokens | Yes |
| Qwen3-Coder 480B | Qwen/Qwen3-Coder-480B-A35B-Instruct | 262K | $0.40/M tokens | Yes |
| Qwen3-Next 80B Instruct | Qwen/Qwen3-Next-80B-A3B-Instruct | 262K | $0.30/M tokens | Yes |
| Qwen3-Next 80B Thinking | Qwen/Qwen3-Next-80B-A3B-Thinking | 262K | $0.30/M tokens | Yes |
| QwQ-32B | Qwen/QwQ-32B | 131K | $0.25/M tokens | - |
Base Models (Text Completions)
| Model | Model ID | Context | Price |
|---|
| Llama 3.1 405B BASE | meta-llama/Meta-Llama-3.1-405B | 131K | $4.00/M tokens |
OpenAI SDK Compatibility
The API is fully compatible with OpenAI’s SDK. Just change the base URL and API key:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_HYPERBOLIC_API_KEY",
base_url="https://api.hyperbolic.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'YOUR_HYPERBOLIC_API_KEY',
baseURL: 'https://api.hyperbolic.xyz/v1'
});
const response = await client.chat.completions.create({
model: 'meta-llama/Llama-3.3-70B-Instruct',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);
Error Handling
| Error Code | Description | Solution |
|---|
401 | Invalid or missing API key | Check your API key is correct and included in the Authorization header |
400 | Invalid request parameters | Verify model ID, message format, and parameter values |
429 | Rate limit exceeded | Reduce request frequency or upgrade to Pro tier |
500 | Server error | Retry the request; contact support if persistent |
Basic tier allows 60 requests/minute. Upgrade to Pro tier (minimum $5 deposit) for 600 requests/minute.
Next Steps