Skip to main content
Master the full lifecycle of GPU instance management on Hyperbolic - from creation to termination, including monitoring, scaling, and troubleshooting.
This guide covers managing instances through the Hyperbolic Web UI. For programmatic access and API support, please contact our enterprise sales team.

Creating Instances

Web UI Method

1

Navigate to On-Demand GPU platform

Go to app.hyperbolic.ai/gpus and browse available GPUs.
2

Select GPU Configuration

  • Choose GPU type (H100 80GB, H200 141GB)
  • Select quantity
  • Pick region for optimal latency
  • Choose InfiniBand if needed for multi-GPU
3

Configure Instance

  • Storage: Configure as needed for your storage needs
  • Label: Name your instance for easy identification
  • SSH Keys (Optional): Add SSH keys for secure access to your instance
4

Launch

Review pricing and click “Start Building”. Instance will be ready in a few minutes (may take up to 25 minutes depending on configuration and region).

Connecting to Instances

SSH Connection

Basic SSH connection:
# Standard connection
ssh ubuntu@<instance-ip>

# With specific key
ssh -i ~/.ssh/hyperbolic_key ubuntu@<instance-ip>

File Transfer

# Upload file
scp model.pth ubuntu@<instance-ip>:/home/ubuntu/

# Download file
scp ubuntu@<instance-ip>:/home/ubuntu/results.csv ./

# Upload directory
scp -r dataset/ ubuntu@<instance-ip>:/home/ubuntu/

Instance Lifecycle Management

Terminating Instances

Termination is permanent. All data on the instance will be lost. Always backup important data before terminating.
To terminate an instance:
  1. Go to “My Instances” in the Web UI
  2. Click “Terminate” in the instance actions
  3. Confirm the deletion

Monitoring and Metrics

Real-Time GPU Monitoring

SSH into your instance and use these commands:
# Basic GPU status
nvidia-smi

# Continuous monitoring (updates every 1 second)
watch -n 1 nvidia-smi

# Detailed GPU metrics
nvidia-smi -q

# Show running processes
nvidia-smi pmon

# GPU utilization over time
nvidia-smi dmon

System Monitoring

# System resources
htop

# Disk usage
df -h

# Memory usage
free -h

# Network statistics
nethogs  # Install with: sudo apt install nethogs

# Process monitoring
ps aux | grep python

Setting Up Custom Monitoring

# Install DCGM exporter for GPU metrics
docker run -d --gpus all --rm -p 9400:9400 nvidia/dcgm-exporter:latest

# Install Prometheus
docker run -d -p 9090:9090 \
  -v prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

# Install Grafana
docker run -d -p 3000:3000 grafana/grafana

Managing Multiple Instances

Viewing All Instances

Go to app.hyperbolic.ai/instances to view all your instances with:
  • Instance status
  • GPU type and configuration
  • Region
  • Running time and costs
  • Quick action buttons

Batch Operations

Parallel SSH Commands:
# Run command on multiple instances
for ip in 192.168.1.10 192.168.1.11 192.168.1.12; do
    ssh ubuntu@$ip "nvidia-smi" &
done
wait

# Using GNU parallel
parallel -j 4 ssh ubuntu@{} "python train.py" ::: \
  instance1.hyperbolic.ai \
  instance2.hyperbolic.ai \
  instance3.hyperbolic.ai
Using Ansible:
# inventory.yml
all:
  hosts:
    h100-1:
      ansible_host: 192.168.1.10
    h100-2:
      ansible_host: 192.168.1.11
    h100-3:
      ansible_host: 192.168.1.12
  vars:
    ansible_user: ubuntu
    ansible_ssh_private_key_file: ~/.ssh/hyperbolic_key

# Run command on all instances
ansible all -i inventory.yml -m shell -a "nvidia-smi"

Instance States and Troubleshooting

Instance States

StateDescriptionBilling
PendingInstance is being provisionedNo
RunningInstance is active and accessibleYes
StartingInstance is booting upNo
TerminatingInstance is being deletedNo
FailedInstance failed to startNo

Common Issues and Solutions

Solution:
  • Wait up to 25 minutes for provisioning
  • Check region availability in the dashboard
  • Contact support if pending > 30 minutes
Check the instance status in your dashboard for updates.
Solution:
  • Verify instance is in “Running” state
  • Check SSH key is authorized
  • Confirm IP address is correct
  • Test network connectivity
# Debug SSH connection
ssh -vvv ubuntu@<instance-ip>
Solution:
  • Verify GPU driver is loaded
  • Check Docker runtime configuration
  • Restart instance if needed
# Check GPU availability
nvidia-smi

# Check driver
nvidia-smi -q | grep "Driver Version"

# For Docker containers
docker run --gpus all nvidia/cuda:12.2.0-base nvidia-smi
Possible causes:
  • Insufficient account balance
  • Violation of terms of service
  • Hardware failure (rare)
Solution:
  • Check billing status in your dashboard
  • Review instance logs
  • Contact support for clarification
Still having issues? Contact [email protected] or use the Intercom chat widget for immediate assistance.

Best Practices

Use Instance Labels

Name your instances clearly to track different projects and experiments.

Implement Auto-shutdown

Set up scripts to automatically terminate idle instances to avoid unnecessary charges.

Regular Backups

Schedule regular backups of important data to external storage (S3, GCS, etc.).

Monitor Costs

Set up billing alerts and regularly review instance usage to optimize costs.

Auto-shutdown Script Example

#!/usr/bin/env python3
import subprocess
import time
import sys

def get_gpu_utilization():
    """Get current GPU utilization percentage"""
    result = subprocess.run(
        ['nvidia-smi', '--query-gpu=utilization.gpu', '--format=csv,noheader,nounits'],
        capture_output=True,
        text=True
    )
    return int(result.stdout.strip())

def auto_shutdown(idle_minutes=30, threshold=5):
    """Shutdown instance if GPU idle for specified minutes"""
    idle_count = 0
    check_interval = 60  # Check every minute

    while True:
        util = get_gpu_utilization()

        if util < threshold:
            idle_count += 1
            print(f"GPU idle ({util}%). Idle count: {idle_count}/{idle_minutes}")

            if idle_count >= idle_minutes:
                print("Shutting down due to inactivity...")
                subprocess.run(['sudo', 'shutdown', '-h', 'now'])
                sys.exit(0)
        else:
            idle_count = 0
            print(f"GPU active ({util}%). Reset idle counter.")

        time.sleep(check_interval)

if __name__ == "__main__":
    auto_shutdown(idle_minutes=30, threshold=5)

Advanced Configuration

Custom Startup Scripts

Create a startup script that runs when instance boots:
#!/bin/bash
# /home/ubuntu/startup.sh

# Mount additional storage
sudo mount /dev/nvme1n1 /data

# Start Jupyter
jupyter notebook --no-browser --port=8888 &

# Start TensorBoard
tensorboard --logdir=/home/ubuntu/logs --port=6006 &

# Start monitoring
python /home/ubuntu/monitor_gpu.py &

Data Persistence Strategies

# Install AWS CLI
sudo apt-get update
sudo apt-get install awscli -y

# Configure AWS credentials
aws configure

# Sync data to S3
aws s3 sync /home/ubuntu/models/ s3://my-bucket/models/

# Download from S3
aws s3 sync s3://my-bucket/dataset/ /home/ubuntu/dataset/

Next Steps