Skip to main content

Audio APIs

Convert text to natural-sounding speech using Melo TTS. Generate high-quality audio in multiple languages with customizable speed and speaker options.

Text-to-Speech

Endpoint

POST https://api.hyperbolic.xyz/v1/audio/generation

Basic Example

import requests
import base64

url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "text": "Welcome to Hyperbolic! This is an example of text-to-speech generation.",
    "speed": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Decode and save the audio file
audio_data = base64.b64decode(result["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio_data)

Request Parameters

Required Parameters

ParameterTypeDescription
textstringThe text to convert to speech

Optional Parameters

ParameterTypeDefaultDescription
languagestringENLanguage code for speech generation
speakerstring-Speaker variant (language-specific)
speedfloat1Speech speed multiplier (0.1-5)
sdp_ratiofloat-Stochastic duration predictor ratio, controls prosody variation (0-1)
noise_scalefloat-Controls randomness in speech generation for more natural variation (0-1)
noise_scale_wfloat-Controls randomness in duration prediction for timing variation (0-1)

Response Format

The API returns a JSON object containing base64-encoded MP3 audio:
{
  "audio": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2..."
}

Decoding the Response

import base64

def save_audio(response_json, filename="output.mp3"):
    """Decode and save the generated audio."""
    audio_data = base64.b64decode(response_json["audio"])
    with open(filename, "wb") as f:
        f.write(audio_data)

Supported Languages

Melo TTS supports 6 languages with various speaker options:
LanguageCodeAvailable Speakers
EnglishENEN-US, EN-BR, EN-INDIA, EN-AU
SpanishESES
FrenchFRFR
ChineseZHZH
JapaneseJPJP
KoreanKRKR
English has multiple speaker variants for different accents: US (American), BR (British), INDIA (Indian), and AU (Australian).

Using Different Languages and Speakers

import requests
import base64

url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

# Generate speech with Australian English accent
data = {
    "text": "G'day! Welcome to our application.",
    "language": "EN",
    "speaker": "EN-AU",
    "speed": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

audio_data = base64.b64decode(result["audio"])
with open("australian_greeting.mp3", "wb") as f:
    f.write(audio_data)

Multilingual Example

# French
data = {
    "text": "Bonjour! Bienvenue sur notre plateforme.",
    "language": "FR",
    "speaker": "FR",
    "speed": 1
}

# Japanese
data = {
    "text": "こんにちは!ようこそ。",
    "language": "JP",
    "speaker": "JP",
    "speed": 1
}

# Chinese
data = {
    "text": "你好!欢迎使用我们的平台。",
    "language": "ZH",
    "speaker": "ZH",
    "speed": 1
}

Speed Control

Adjust the speed parameter to control how fast the speech is generated:
SpeedEffect
0.5Slow, deliberate speech
0.75Slightly slower than normal
1.0Normal speed (default)
1.5Faster speech
2.0Very fast

Example

# Slow narration for accessibility
data = {
    "text": "Please listen carefully to the following instructions.",
    "speed": 0.7
}

# Fast playback for quick review
data = {
    "text": "This is a quick summary of the key points.",
    "speed": 1.5
}
Use slower speeds (0.5-0.8) for instructional content or accessibility needs. Use faster speeds (1.2-1.5) for content review or when listeners prefer quicker playback.

Pricing

Rate: $5.00 per 1 million characters
Text LengthApproximate Cost
1,000 characters$0.005
10,000 characters$0.05
100,000 characters$0.50
1,000,000 characters$5.00
There are no character limits per request. You are billed based on the total characters processed.

Use Cases

  • Voice assistants: Add natural speech to chatbots and virtual assistants
  • Audiobook generation: Convert written content to audio format
  • Accessibility: Make content accessible for visually impaired users
  • Video narration: Generate voiceovers for videos and presentations
  • Language learning: Create pronunciation examples in multiple languages
  • Notification systems: Generate audio alerts and announcements

Next Steps