Text Generation APIs

Access 20+ large language models through Hyperbolic’s OpenAI-compatible chat completions API. Generate text, have conversations, write code, and more with state-of-the-art open-source models.

Chat Completions

The chat completions endpoint is the primary way to interact with text generation models.

Endpoint

POST https://api.hyperbolic.xyz/v1/chat/completions

Basic Request

Python
cURL

import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 512,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `deepseek-ai/DeepSeek-R1`)
`messages`	array	Yes	Array of message objects with `role` and `content`
`max_tokens`	integer	No	Maximum number of tokens to generate
`temperature`	float	No	Sampling temperature (0-2). Higher = more creative
`top_p`	float	No	Nucleus sampling threshold (0-1)
`stream`	boolean	No	Enable streaming responses (default: false)
`stop`	string/array	No	Stop sequence(s) to end generation

Messages Format

Messages are an array of objects with role and content fields:

{
  "messages": [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to reverse a string."},
    {"role": "assistant", "content": "Here's a function to reverse a string..."},
    {"role": "user", "content": "Can you make it more efficient?"}
  ]
}

Roles:

system: Sets the behavior and context for the assistant
user: Messages from the user
assistant: Previous responses from the model (for conversation history)

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "deepseek-ai/DeepSeek-R1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165
  }
}

Multi-turn Conversations

To maintain conversation context, include the full message history in each request:

Python
cURL

import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Maintain conversation history
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
]

response = requests.post(url, headers=headers, json={
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": messages,
    "max_tokens": 256
})

print(response.json()["choices"][0]["message"]["content"])

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"},
      {"role": "assistant", "content": "The capital of France is Paris."},
      {"role": "user", "content": "What is its population?"}
    ],
    "max_tokens": 256
  }'

Streaming Responses

Enable real-time token streaming for responsive chat applications:

Python
cURL

import requests

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "user", "content": "Write a short story about a robot."}
    ],
    "max_tokens": 512,
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: ') and line != 'data: [DONE]':
            import json
            chunk = json.loads(line[6:])
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "user", "content": "Write a short story about a robot."}
    ],
    "max_tokens": 512,
    "stream": true
  }'

For tool calling (function calling) capabilities, see the Tool Calling section below.

System Prompts

Use system prompts to set the model’s behavior, persona, or instructions:

messages = [
    {
        "role": "system",
        "content": """You are an expert Python developer. Follow these guidelines:
- Write clean, well-documented code
- Include type hints
- Add brief comments explaining complex logic
- Suggest improvements when relevant"""
    },
    {"role": "user", "content": "Write a function to find prime numbers."}
]

Tool Calling

Tool calling (also known as function calling) allows models to invoke external tools and APIs. The model generates structured JSON arguments that your application can use to call functions, then the results can be passed back to continue the conversation.

Supported Models

The following models support tool calling:

deepseek-ai/DeepSeek-R1-0528
deepseek-ai/DeepSeek-R1
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-V3-0324
meta-llama/Meta-Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.3-70B-Instruct
moonshotai/Kimi-K2-Instruct
openai/gpt-oss-120b
openai/gpt-oss-20b
Qwen/Qwen2.5-72B-Instruct
Qwen/Qwen3-235B-A22B
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen/Qwen3-Coder-480B-A35B-Instruct
Qwen/Qwen3-Next-80B-A3B-Instruct
Qwen/Qwen3-Next-80B-A3B-Thinking

Example

Python
cURL

import requests
import json

url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

data = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    "tools": tools,
    "tool_choice": "auto"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Check if the model wants to call a tool
message = result["choices"][0]["message"]
if message.get("tool_calls"):
    tool_call = message["tool_calls"][0]
    print(f"Function: {tool_call['function']['name']}")
    print(f"Arguments: {tool_call['function']['arguments']}")

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "user", "content": "What'\''s the weather like in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

The tool_choice parameter controls how the model uses tools: "auto" lets the model decide, "none" disables tool use, or specify a tool name to force its use.

Text Completions (Base Models)

For raw text completion without chat formatting, use the completions endpoint with base models like Llama 3.1 405B BASE.

Endpoint

POST https://api.hyperbolic.xyz/v1/completions

Example

Python
cURL

import requests

url = "https://api.hyperbolic.xyz/v1/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "model": "meta-llama/Meta-Llama-3.1-405B",
    "prompt": "The key principles of machine learning are",
    "max_tokens": 256,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["text"])

curl -X POST "https://api.hyperbolic.xyz/v1/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-405B",
    "prompt": "The key principles of machine learning are",
    "max_tokens": 256,
    "temperature": 0.7
  }'

Base models are ideal for text completion, fill-in-the-middle tasks, and custom prompting strategies where chat formatting isn’t needed.

Available Models

Instruct Models (Chat Completions)

Model	Model ID	Context	Price	Tools
DeepSeek-R1	`deepseek-ai/DeepSeek-R1`	131K	$2.00/M tokens	Yes
DeepSeek-R1-0528	`deepseek-ai/DeepSeek-R1-0528`	164K	$3.00/M tokens	Yes
DeepSeek-V3	`deepseek-ai/DeepSeek-V3`	131K	$0.25/M tokens	Yes
DeepSeek-V3-0324	`deepseek-ai/DeepSeek-V3-0324`	131K	$1.25/M tokens	Yes
GPT-OSS 120B	`openai/gpt-oss-120b`	131K	$0.30/M tokens	Yes
GPT-OSS 20B	`openai/gpt-oss-20b`	131K	$0.10/M tokens	Yes
Kimi-K2	`moonshotai/Kimi-K2-Instruct`	131K	$2.00/M tokens	Yes
Llama 3 70B	`meta-llama/Meta-Llama-3-70B-Instruct`	131K	$0.40/M tokens	-
Llama 3.1 405B	`meta-llama/Meta-Llama-3.1-405B-Instruct`	131K	$4.00/M tokens	Yes
Llama 3.1 70B	`meta-llama/Meta-Llama-3.1-70B-Instruct`	131K	$0.40/M tokens	Yes
Llama 3.1 8B	`meta-llama/Meta-Llama-3.1-8B-Instruct`	131K	$0.10/M tokens	Yes
Llama 3.2 3B	`meta-llama/Llama-3.2-3B-Instruct`	131K	$0.10/M tokens	Yes
Llama 3.3 70B	`meta-llama/Llama-3.3-70B-Instruct`	131K	$0.40/M tokens	Yes
Qwen 2.5 72B	`Qwen/Qwen2.5-72B-Instruct`	131K	$0.40/M tokens	Yes
Qwen 2.5 Coder 32B	`Qwen/Qwen2.5-Coder-32B-Instruct`	131K	$0.20/M tokens	-
Qwen3-235B-A22B	`Qwen/Qwen3-235B-A22B`	41K	$0.40/M tokens	Yes
Qwen3-235B Instruct 2507	`Qwen/Qwen3-235B-A22B-Instruct-2507`	262K	$0.25/M tokens	Yes
Qwen3-Coder 480B	`Qwen/Qwen3-Coder-480B-A35B-Instruct`	262K	$0.40/M tokens	Yes
Qwen3-Next 80B Instruct	`Qwen/Qwen3-Next-80B-A3B-Instruct`	262K	$0.30/M tokens	Yes
Qwen3-Next 80B Thinking	`Qwen/Qwen3-Next-80B-A3B-Thinking`	262K	$0.30/M tokens	Yes
QwQ-32B	`Qwen/QwQ-32B`	131K	$0.25/M tokens	-

Base Models (Text Completions)

Model	Model ID	Context	Price
Llama 3.1 405B BASE	`meta-llama/Meta-Llama-3.1-405B`	131K	$4.00/M tokens

For vision-language models that can process images alongside text, see Vision Language Models.

OpenAI SDK Compatibility

The API is fully compatible with OpenAI’s SDK. Just change the base URL and API key:

Python
Node.js

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_HYPERBOLIC_API_KEY",
    base_url="https://api.hyperbolic.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_HYPERBOLIC_API_KEY',
  baseURL: 'https://api.hyperbolic.xyz/v1'
});

const response = await client.chat.completions.create({
  model: 'meta-llama/Llama-3.3-70B-Instruct',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ]
});

console.log(response.choices[0].message.content);

Error Handling

Error Code	Description	Solution
`401`	Invalid or missing API key	Check your API key is correct and included in the Authorization header
`400`	Invalid request parameters	Verify model ID, message format, and parameter values
`429`	Rate limit exceeded	Reduce request frequency or upgrade to Pro tier
`500`	Server error	Retry the request; contact support if persistent

Basic tier allows 60 requests/minute. Upgrade to Pro tier (minimum $5 deposit) for 600 requests/minute.

Next Steps

Vision Language Models

Process images with multimodal AI models

Image APIs

Generate images from text prompts

Audio APIs

Text-to-speech and audio generation

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

Text Generation APIs

Text Generation APIs

Chat Completions

Endpoint

Basic Request

Request Parameters

Messages Format

Response Format

Multi-turn Conversations

Streaming Responses

System Prompts

Tool Calling

Supported Models

Example

Text Completions (Base Models)

Endpoint

Example

Available Models

Instruct Models (Chat Completions)

Base Models (Text Completions)

OpenAI SDK Compatibility

Error Handling

Next Steps

Vision Language Models

Image APIs

Audio APIs

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

​Text Generation APIs

​Chat Completions

​Endpoint

​Basic Request

​Request Parameters

​Messages Format

​Response Format

​Multi-turn Conversations

​Streaming Responses

​System Prompts

​Tool Calling

​Supported Models

​Example

​Text Completions (Base Models)

​Endpoint

​Example

​Available Models

​Instruct Models (Chat Completions)

​Base Models (Text Completions)

​OpenAI SDK Compatibility

​Error Handling

​Next Steps

Vision Language Models

Image APIs

Audio APIs

Text Generation APIs

Chat Completions

Endpoint

Basic Request

Request Parameters

Messages Format

Response Format

Multi-turn Conversations

Streaming Responses

System Prompts

Tool Calling

Supported Models

Example

Text Completions (Base Models)

Endpoint

Example

Available Models

Instruct Models (Chat Completions)

Base Models (Text Completions)

OpenAI SDK Compatibility

Error Handling

Next Steps