Audio APIs

Convert text to natural-sounding speech using Melo TTS. Generate high-quality audio in multiple languages with customizable speed and speaker options.

Text-to-Speech

Endpoint

POST https://api.hyperbolic.xyz/v1/audio/generation

Basic Example

Python
cURL

import requests
import base64

url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "text": "Welcome to Hyperbolic! This is an example of text-to-speech generation.",
    "speed": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Decode and save the audio file
audio_data = base64.b64decode(result["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio_data)

curl -X POST "https://api.hyperbolic.xyz/v1/audio/generation" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "text": "Welcome to Hyperbolic! This is an example of text-to-speech generation.",
    "speed": 1
  }' | jq -r ".audio" | base64 -d > output.mp3

Request Parameters

Required Parameters

Parameter	Type	Description
`text`	string	The text to convert to speech

Optional Parameters

Parameter	Type	Default	Description
`language`	string	`EN`	Language code for speech generation
`speaker`	string	-	Speaker variant (language-specific)
`speed`	float	1	Speech speed multiplier (0.1-5)
`sdp_ratio`	float	-	Stochastic duration predictor ratio, controls prosody variation (0-1)
`noise_scale`	float	-	Controls randomness in speech generation for more natural variation (0-1)
`noise_scale_w`	float	-	Controls randomness in duration prediction for timing variation (0-1)

Response Format

The API returns a JSON object containing base64-encoded MP3 audio:

{
  "audio": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2..."
}

Decoding the Response

import base64

def save_audio(response_json, filename="output.mp3"):
    """Decode and save the generated audio."""
    audio_data = base64.b64decode(response_json["audio"])
    with open(filename, "wb") as f:
        f.write(audio_data)

Supported Languages

Melo TTS supports 6 languages with various speaker options:

Language	Code	Available Speakers
English	`EN`	`EN-US`, `EN-BR`, `EN-INDIA`, `EN-AU`
Spanish	`ES`	`ES`
French	`FR`	`FR`
Chinese	`ZH`	`ZH`
Japanese	`JP`	`JP`
Korean	`KR`	`KR`

English has multiple speaker variants for different accents: US (American), BR (British), INDIA (Indian), and AU (Australian).

Using Different Languages and Speakers

Python
cURL

import requests
import base64

url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Generate speech with Australian English accent
data = {
    "text": "G'day! Welcome to our application.",
    "language": "EN",
    "speaker": "EN-AU",
    "speed": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

audio_data = base64.b64decode(result["audio"])
with open("australian_greeting.mp3", "wb") as f:
    f.write(audio_data)

curl -X POST "https://api.hyperbolic.xyz/v1/audio/generation" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "text": "G'\''day! Welcome to our application.",
    "language": "EN",
    "speaker": "EN-AU",
    "speed": 1
  }' | jq -r ".audio" | base64 -d > australian_greeting.mp3

Multilingual Example

# French
data = {
    "text": "Bonjour! Bienvenue sur notre plateforme.",
    "language": "FR",
    "speaker": "FR",
    "speed": 1
}

# Japanese
data = {
    "text": "こんにちは！ようこそ。",
    "language": "JP",
    "speaker": "JP",
    "speed": 1
}

# Chinese
data = {
    "text": "你好！欢迎使用我们的平台。",
    "language": "ZH",
    "speaker": "ZH",
    "speed": 1
}

Speed Control

Adjust the speed parameter to control how fast the speech is generated:

Speed	Effect
`0.5`	Slow, deliberate speech
`0.75`	Slightly slower than normal
`1.0`	Normal speed (default)
`1.5`	Faster speech
`2.0`	Very fast

Example

# Slow narration for accessibility
data = {
    "text": "Please listen carefully to the following instructions.",
    "speed": 0.7
}

# Fast playback for quick review
data = {
    "text": "This is a quick summary of the key points.",
    "speed": 1.5
}

Use slower speeds (0.5-0.8) for instructional content or accessibility needs. Use faster speeds (1.2-1.5) for content review or when listeners prefer quicker playback.

Pricing

Rate: $5.00 per 1 million characters

Text Length	Approximate Cost
1,000 characters	$0.005
10,000 characters	$0.05
100,000 characters	$0.50
1,000,000 characters	$5.00

There are no character limits per request. You are billed based on the total characters processed.

Use Cases

Voice assistants: Add natural speech to chatbots and virtual assistants
Audiobook generation: Convert written content to audio format
Accessibility: Make content accessible for visually impaired users
Video narration: Generate voiceovers for videos and presentations
Language learning: Create pronunciation examples in multiple languages
Notification systems: Generate audio alerts and announcements

Next Steps

Text APIs

Generate text with large language models

Vision Language Models

Analyze images with multimodal AI

Image APIs

Generate images from text prompts

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

Audio APIs

Audio APIs

Text-to-Speech

Endpoint

Basic Example

Request Parameters

Required Parameters

Optional Parameters

Response Format

Decoding the Response

Supported Languages

Using Different Languages and Speakers

Multilingual Example

Speed Control

Example

Pricing

Use Cases

Next Steps

Text APIs

Vision Language Models

Image APIs

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

​Audio APIs

​Text-to-Speech

​Endpoint

​Basic Example

​Request Parameters

​Required Parameters

​Optional Parameters

​Response Format

​Decoding the Response

​Supported Languages

​Using Different Languages and Speakers

​Multilingual Example

​Speed Control

​Example

​Pricing

​Use Cases

​Next Steps

Text APIs

Vision Language Models

Image APIs

Audio APIs

Text-to-Speech

Endpoint

Basic Example

Request Parameters

Required Parameters

Optional Parameters

Response Format

Decoding the Response

Supported Languages

Using Different Languages and Speakers

Multilingual Example

Speed Control

Example

Pricing

Use Cases

Next Steps