Skip to main content

Text Generation APIs

Access 20+ large language models through Hyperbolic’s OpenAI-compatible chat completions API. Generate text, have conversations, write code, and more with state-of-the-art open-source models.

Chat Completions

The chat completions endpoint is the primary way to interact with text generation models.

Endpoint

POST https://api.hyperbolic.xyz/v1/chat/completions

Basic Request

import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 512,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., deepseek-ai/DeepSeek-R1)
messagesarrayYesArray of message objects with role and content
max_tokensintegerNoMaximum number of tokens to generate
temperaturefloatNoSampling temperature (0-2). Higher = more creative
top_pfloatNoNucleus sampling threshold (0-1)
streambooleanNoEnable streaming responses (default: false)
stopstring/arrayNoStop sequence(s) to end generation

Messages Format

Messages are an array of objects with role and content fields:
{
  "messages": [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to reverse a string."},
    {"role": "assistant", "content": "Here's a function to reverse a string..."},
    {"role": "user", "content": "Can you make it more efficient?"}
  ]
}
Roles:
  • system: Sets the behavior and context for the assistant
  • user: Messages from the user
  • assistant: Previous responses from the model (for conversation history)

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "deepseek-ai/DeepSeek-R1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165
  }
}

Multi-turn Conversations

To maintain conversation context, include the full message history in each request:
import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Maintain conversation history
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
]

response = requests.post(url, headers=headers, json={
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": messages,
    "max_tokens": 256
})

print(response.json()["choices"][0]["message"]["content"])

Streaming Responses

Enable real-time token streaming for responsive chat applications:
import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "user", "content": "Write a short story about a robot."}
    ],
    "max_tokens": 512,
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: ') and line != 'data: [DONE]':
            import json
            chunk = json.loads(line[6:])
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)
For more advanced streaming patterns including function calling, see Advanced Features.

System Prompts

Use system prompts to set the model’s behavior, persona, or instructions:
messages = [
    {
        "role": "system",
        "content": """You are an expert Python developer. Follow these guidelines:
- Write clean, well-documented code
- Include type hints
- Add brief comments explaining complex logic
- Suggest improvements when relevant"""
    },
    {"role": "user", "content": "Write a function to find prime numbers."}
]

Text Completions (Base Models)

For raw text completion without chat formatting, use the completions endpoint with base models like Llama 3.1 405B BASE.

Endpoint

POST https://api.hyperbolic.xyz/v1/completions

Example

import requests

url = "https://api.hyperbolic.xyz/v1/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Meta-Llama-3.1-405B",
    "prompt": "The key principles of machine learning are",
    "max_tokens": 256,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["text"])
Base models are ideal for text completion, fill-in-the-middle tasks, and custom prompting strategies where chat formatting isn’t needed.

Available Models

Instruct Models (Chat Completions)

ModelModel IDContextPrice
DeepSeek-R1deepseek-ai/DeepSeek-R1131K$2.00/M tokens
DeepSeek-R1-0528deepseek-ai/DeepSeek-R1-0528164K$3.00/M tokens
DeepSeek-V3deepseek-ai/DeepSeek-V3131K$0.25/M tokens
DeepSeek-V3-0324deepseek-ai/DeepSeek-V3-0324131K$1.25/M tokens
GPT-OSS 120Bopenai/gpt-oss-120b131K$0.30/M tokens
GPT-OSS 20Bopenai/gpt-oss-20b131K$0.10/M tokens
Kimi-K2moonshotai/Kimi-K2-Instruct131K$2.00/M tokens
Llama 3 70Bmeta-llama/Meta-Llama-3-70B-Instruct131K$0.40/M tokens
Llama 3.1 405Bmeta-llama/Meta-Llama-3.1-405B-Instruct131K$4.00/M tokens
Llama 3.1 70Bmeta-llama/Meta-Llama-3.1-70B-Instruct131K$0.40/M tokens
Llama 3.1 8Bmeta-llama/Meta-Llama-3.1-8B-Instruct131K$0.10/M tokens
Llama 3.2 3Bmeta-llama/Llama-3.2-3B-Instruct131K$0.10/M tokens
Llama 3.3 70Bmeta-llama/Llama-3.3-70B-Instruct131K$0.40/M tokens
Qwen 2.5 72BQwen/Qwen2.5-72B-Instruct131K$0.40/M tokens
Qwen 2.5 Coder 32BQwen/Qwen2.5-Coder-32B-Instruct131K$0.20/M tokens
Qwen3-235B-A22BQwen/Qwen3-235B-A22B41K$0.40/M tokens
Qwen3-235B Instruct 2507Qwen/Qwen3-235B-A22B-Instruct-2507262K$2.00/M tokens
Qwen3-Coder 480BQwen/Qwen3-Coder-480B-A35B-Instruct262K$2.00/M tokens
Qwen3-Next 80B InstructQwen/Qwen3-Next-80B-A3B-Instruct262K$0.30/M tokens
Qwen3-Next 80B ThinkingQwen/Qwen3-Next-80B-A3B-Thinking262K$0.30/M tokens
QwQ-32BQwen/QwQ-32B131K$0.40/M tokens

Base Models (Text Completions)

ModelModel IDContextPrice
Llama 3.1 405B BASEmeta-llama/Meta-Llama-3.1-405B131K$4.00/M tokens
For vision-language models that can process images alongside text, see Vision Language Models.

OpenAI SDK Compatibility

The API is fully compatible with OpenAI’s SDK. Just change the base URL and API key:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_HYPERBOLIC_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Error Handling

Error CodeDescriptionSolution
401Invalid or missing API keyCheck your API key is correct and included in the Authorization header
400Invalid request parametersVerify model ID, message format, and parameter values
429Rate limit exceededReduce request frequency or upgrade to Pro tier
500Server errorRetry the request; contact support if persistent
Basic tier allows 60 requests/minute. Upgrade to Pro tier (minimum $5 deposit) for 600 requests/minute.

Next Steps