Audio APIs
Convert text to natural-sounding speech using Melo TTS. Generate high-quality audio in multiple languages with customizable speed and speaker options.
Text-to-Speech
Endpoint
POST https://api.hyperbolic.xyz/v1/audio/generation
Basic Example
import requests
import base64
url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"text": "Welcome to Hyperbolic! This is an example of text-to-speech generation.",
"speed": 1
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
# Decode and save the audio file
audio_data = base64.b64decode(result["audio"])
with open("output.mp3", "wb") as f:
f.write(audio_data)
curl -X POST "https://api.hyperbolic.xyz/v1/audio/generation" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"text": "Welcome to Hyperbolic! This is an example of text-to-speech generation.",
"speed": 1
}' | jq -r ".audio" | base64 -d > output.mp3
Request Parameters
Required Parameters
| Parameter | Type | Description |
|---|
text | string | The text to convert to speech |
Optional Parameters
| Parameter | Type | Default | Description |
|---|
language | string | EN | Language code for speech generation |
speaker | string | - | Speaker variant (language-specific) |
speed | float | 1 | Speech speed multiplier (0.1-5) |
sdp_ratio | float | - | Stochastic duration predictor ratio, controls prosody variation (0-1) |
noise_scale | float | - | Controls randomness in speech generation for more natural variation (0-1) |
noise_scale_w | float | - | Controls randomness in duration prediction for timing variation (0-1) |
The API returns a JSON object containing base64-encoded MP3 audio:
{
"audio": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2..."
}
Decoding the Response
import base64
def save_audio(response_json, filename="output.mp3"):
"""Decode and save the generated audio."""
audio_data = base64.b64decode(response_json["audio"])
with open(filename, "wb") as f:
f.write(audio_data)
Supported Languages
Melo TTS supports 6 languages with various speaker options:
| Language | Code | Available Speakers |
|---|
| English | EN | EN-US, EN-BR, EN-INDIA, EN-AU |
| Spanish | ES | ES |
| French | FR | FR |
| Chinese | ZH | ZH |
| Japanese | JP | JP |
| Korean | KR | KR |
English has multiple speaker variants for different accents: US (American), BR (British), INDIA (Indian), and AU (Australian).
Using Different Languages and Speakers
import requests
import base64
url = "https://api.hyperbolic.xyz/v1/audio/generation"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
# Generate speech with Australian English accent
data = {
"text": "G'day! Welcome to our application.",
"language": "EN",
"speaker": "EN-AU",
"speed": 1
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
audio_data = base64.b64decode(result["audio"])
with open("australian_greeting.mp3", "wb") as f:
f.write(audio_data)
curl -X POST "https://api.hyperbolic.xyz/v1/audio/generation" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"text": "G'\''day! Welcome to our application.",
"language": "EN",
"speaker": "EN-AU",
"speed": 1
}' | jq -r ".audio" | base64 -d > australian_greeting.mp3
Multilingual Example
# French
data = {
"text": "Bonjour! Bienvenue sur notre plateforme.",
"language": "FR",
"speaker": "FR",
"speed": 1
}
# Japanese
data = {
"text": "こんにちは!ようこそ。",
"language": "JP",
"speaker": "JP",
"speed": 1
}
# Chinese
data = {
"text": "你好!欢迎使用我们的平台。",
"language": "ZH",
"speaker": "ZH",
"speed": 1
}
Speed Control
Adjust the speed parameter to control how fast the speech is generated:
| Speed | Effect |
|---|
0.5 | Slow, deliberate speech |
0.75 | Slightly slower than normal |
1.0 | Normal speed (default) |
1.5 | Faster speech |
2.0 | Very fast |
Example
# Slow narration for accessibility
data = {
"text": "Please listen carefully to the following instructions.",
"speed": 0.7
}
# Fast playback for quick review
data = {
"text": "This is a quick summary of the key points.",
"speed": 1.5
}
Use slower speeds (0.5-0.8) for instructional content or accessibility needs. Use faster speeds (1.2-1.5) for content review or when listeners prefer quicker playback.
Pricing
Rate: $5.00 per 1 million characters
| Text Length | Approximate Cost |
|---|
| 1,000 characters | $0.005 |
| 10,000 characters | $0.05 |
| 100,000 characters | $0.50 |
| 1,000,000 characters | $5.00 |
There are no character limits per request. You are billed based on the total characters processed.
Use Cases
- Voice assistants: Add natural speech to chatbots and virtual assistants
- Audiobook generation: Convert written content to audio format
- Accessibility: Make content accessible for visually impaired users
- Video narration: Generate voiceovers for videos and presentations
- Language learning: Create pronunciation examples in multiple languages
- Notification systems: Generate audio alerts and announcements
Next Steps