Skip to main content

Text Generation APIs

Access 20+ large language models through Hyperbolic’s OpenAI-compatible chat completions API. Generate text, have conversations, write code, and more with state-of-the-art open-source models.

Chat Completions

The chat completions endpoint is the primary way to interact with text generation models.

Endpoint

POST https://api.hyperbolic.xyz/v1/chat/completions

Basic Request

import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 512,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., deepseek-ai/DeepSeek-R1)
messagesarrayYesArray of message objects with role and content
max_tokensintegerNoMaximum number of tokens to generate
temperaturefloatNoSampling temperature (0-2). Higher = more creative
top_pfloatNoNucleus sampling threshold (0-1)
streambooleanNoEnable streaming responses (default: false)
stopstring/arrayNoStop sequence(s) to end generation

Messages Format

Messages are an array of objects with role and content fields:
{
  "messages": [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to reverse a string."},
    {"role": "assistant", "content": "Here's a function to reverse a string..."},
    {"role": "user", "content": "Can you make it more efficient?"}
  ]
}
Roles:
  • system: Sets the behavior and context for the assistant
  • user: Messages from the user
  • assistant: Previous responses from the model (for conversation history)

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "deepseek-ai/DeepSeek-R1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165
  }
}

Multi-turn Conversations

To maintain conversation context, include the full message history in each request:
import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Maintain conversation history
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
]

response = requests.post(url, headers=headers, json={
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": messages,
    "max_tokens": 256
})

print(response.json()["choices"][0]["message"]["content"])

Streaming Responses

Enable real-time token streaming for responsive chat applications:
import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "user", "content": "Write a short story about a robot."}
    ],
    "max_tokens": 512,
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: ') and line != 'data: [DONE]':
            import json
            chunk = json.loads(line[6:])
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)
For tool calling (function calling) capabilities, see the Tool Calling section below.

System Prompts

Use system prompts to set the model’s behavior, persona, or instructions:
messages = [
    {
        "role": "system",
        "content": """You are an expert Python developer. Follow these guidelines:
- Write clean, well-documented code
- Include type hints
- Add brief comments explaining complex logic
- Suggest improvements when relevant"""
    },
    {"role": "user", "content": "Write a function to find prime numbers."}
]

Tool Calling

Tool calling (also known as function calling) allows models to invoke external tools and APIs. The model generates structured JSON arguments that your application can use to call functions, then the results can be passed back to continue the conversation.

Supported Models

The following models support tool calling:
  • deepseek-ai/DeepSeek-R1-0528
  • deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-V3
  • deepseek-ai/DeepSeek-V3-0324
  • meta-llama/Meta-Llama-3.1-405B-Instruct
  • meta-llama/Meta-Llama-3.1-70B-Instruct
  • meta-llama/Meta-Llama-3.1-8B-Instruct
  • meta-llama/Llama-3.2-3B-Instruct
  • meta-llama/Llama-3.3-70B-Instruct
  • moonshotai/Kimi-K2-Instruct
  • openai/gpt-oss-120b
  • openai/gpt-oss-20b
  • Qwen/Qwen2.5-72B-Instruct
  • Qwen/Qwen3-235B-A22B
  • Qwen/Qwen3-235B-A22B-Instruct-2507
  • Qwen/Qwen3-Coder-480B-A35B-Instruct
  • Qwen/Qwen3-Next-80B-A3B-Instruct
  • Qwen/Qwen3-Next-80B-A3B-Thinking

Example

import requests
import json

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

data = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    "tools": tools,
    "tool_choice": "auto"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Check if the model wants to call a tool
message = result["choices"][0]["message"]
if message.get("tool_calls"):
    tool_call = message["tool_calls"][0]
    print(f"Function: {tool_call['function']['name']}")
    print(f"Arguments: {tool_call['function']['arguments']}")
The tool_choice parameter controls how the model uses tools: "auto" lets the model decide, "none" disables tool use, or specify a tool name to force its use.

Text Completions (Base Models)

For raw text completion without chat formatting, use the completions endpoint with base models like Llama 3.1 405B BASE.

Endpoint

POST https://api.hyperbolic.xyz/v1/completions

Example

import requests

url = "https://api.hyperbolic.xyz/v1/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Meta-Llama-3.1-405B",
    "prompt": "The key principles of machine learning are",
    "max_tokens": 256,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["text"])
Base models are ideal for text completion, fill-in-the-middle tasks, and custom prompting strategies where chat formatting isn’t needed.

Available Models

Instruct Models (Chat Completions)

ModelModel IDContextPriceTools
DeepSeek-R1deepseek-ai/DeepSeek-R1131K$2.00/M tokensYes
DeepSeek-R1-0528deepseek-ai/DeepSeek-R1-0528164K$3.00/M tokensYes
DeepSeek-V3deepseek-ai/DeepSeek-V3131K$0.25/M tokensYes
DeepSeek-V3-0324deepseek-ai/DeepSeek-V3-0324131K$1.25/M tokensYes
GPT-OSS 120Bopenai/gpt-oss-120b131K$0.30/M tokensYes
GPT-OSS 20Bopenai/gpt-oss-20b131K$0.10/M tokensYes
Kimi-K2moonshotai/Kimi-K2-Instruct131K$2.00/M tokensYes
Llama 3 70Bmeta-llama/Meta-Llama-3-70B-Instruct131K$0.40/M tokens-
Llama 3.1 405Bmeta-llama/Meta-Llama-3.1-405B-Instruct131K$4.00/M tokensYes
Llama 3.1 70Bmeta-llama/Meta-Llama-3.1-70B-Instruct131K$0.40/M tokensYes
Llama 3.1 8Bmeta-llama/Meta-Llama-3.1-8B-Instruct131K$0.10/M tokensYes
Llama 3.2 3Bmeta-llama/Llama-3.2-3B-Instruct131K$0.10/M tokensYes
Llama 3.3 70Bmeta-llama/Llama-3.3-70B-Instruct131K$0.40/M tokensYes
Qwen 2.5 72BQwen/Qwen2.5-72B-Instruct131K$0.40/M tokensYes
Qwen 2.5 Coder 32BQwen/Qwen2.5-Coder-32B-Instruct131K$0.20/M tokens-
Qwen3-235B-A22BQwen/Qwen3-235B-A22B41K$0.40/M tokensYes
Qwen3-235B Instruct 2507Qwen/Qwen3-235B-A22B-Instruct-2507262K$0.25/M tokensYes
Qwen3-Coder 480BQwen/Qwen3-Coder-480B-A35B-Instruct262K$0.40/M tokensYes
Qwen3-Next 80B InstructQwen/Qwen3-Next-80B-A3B-Instruct262K$0.30/M tokensYes
Qwen3-Next 80B ThinkingQwen/Qwen3-Next-80B-A3B-Thinking262K$0.30/M tokensYes
QwQ-32BQwen/QwQ-32B131K$0.25/M tokens-

Base Models (Text Completions)

ModelModel IDContextPrice
Llama 3.1 405B BASEmeta-llama/Meta-Llama-3.1-405B131K$4.00/M tokens
For vision-language models that can process images alongside text, see Vision Language Models.

OpenAI SDK Compatibility

The API is fully compatible with OpenAI’s SDK. Just change the base URL and API key:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_HYPERBOLIC_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Error Handling

Error CodeDescriptionSolution
401Invalid or missing API keyCheck your API key is correct and included in the Authorization header
400Invalid request parametersVerify model ID, message format, and parameter values
429Rate limit exceededReduce request frequency or upgrade to Pro tier
500Server errorRetry the request; contact support if persistent
Basic tier allows 60 requests/minute. Upgrade to Pro tier (minimum $5 deposit) for 600 requests/minute.

Next Steps