Storage and Port Configuration

Learn how to configure storage volumes and network ports for your H100 and H200 GPU instances to support machine learning workloads, model training, and inference deployments.

Storage Options

Every GPU instance comes with onboard storage included, plus the option to attach persistent storage volumes.

Storage Types Overview

Onboard Storage
Persistent Storage

Automatically Included with Every Instance

Cost: Free (included with instance pricing)
Provisioning: Automatically available when instance starts
Persistence: ❌ Erased when instance is terminated
Performance: High-speed NVMe SSD (up to 7,000 MB/s)
Use Cases: Active training, temporary data, cache, scratch space

Onboard Storage by Configuration:

GPU Type	Region	Onboard Storage Size
H100	us-central-1	18TB
H100	eu-north-4	10TB
H100	uk-southeast-3	24TB
H200	uk-central-3	24TB

Onboard storage is erased when the instance is terminated. Always save important data to persistent storage or external services before terminating an instance.

Working with Storage

Using Onboard Storage

Onboard storage is automatically mounted and ready to use when your instance starts:

# View all available storage
df -h

# Onboard storage is typically mounted at:
# /home/ubuntu (root volume for OS and user files)
# /mnt or /data (additional onboard storage space)

# Check onboard storage usage
du -sh /home/ubuntu/*
du -sh /mnt/*

The exact mount points may vary by instance configuration. Use df -h or lsblk to see all available storage.

Creating and Attaching Persistent Storage

Create Persistent Volume

In the Hyperbolic web console:

Navigate to the Storage section
Click “Create Persistent Volume”
Specify size (100GB - 10TB)
Select region (currently us-central-1 only)
Name your volume for easy identification

Persistent storage incurs additional hourly charges. Check current pricing in the console.

Attach to Instance

After creating the volume:

Go to your running instance details
Click “Attach Storage”
Select your persistent volume from the list
The volume will be attached as a block device (e.g., /dev/vdb)

Mount and Use

SSH into your instance and mount the volume:

# Check if volume is attached
lsblk

# Format if new volume (only do this once!)
sudo mkfs.ext4 /dev/vdb

# Create mount point
sudo mkdir -p /mnt/persistent

# Mount the volume
sudo mount /dev/vdb /mnt/persistent

# Set permissions
sudo chown -R $USER:$USER /mnt/persistent

# Make mount persistent across reboots
echo "/dev/vdb /mnt/persistent ext4 defaults 0 2" | sudo tee -a /etc/fstab

Storage Configuration

Storage Planning by Workload

When launching an instance, plan your storage strategy based on your workload and available options: Training Workloads:

Use onboard storage for active training data and scratch space
If available (us-central-1), attach persistent storage for:
- Model checkpoints
- Final trained models
- Datasets you want to reuse
For regions without persistent storage, implement regular backups to S3/GCS/Azure

Inference Workloads:

Load models into onboard storage for fastest performance
Use persistent storage (if available) for model library
Cache frequently accessed data on onboard storage

Development/Experimentation:

Use onboard storage for active development
Save important results to persistent storage or external services
Implement git hooks to backup code changes

Storage Recommendations by Configuration:

GPU	Region	Onboard	Strategy
H100	us-central-1	18TB	Use onboard for active work + persistent volumes for long-term storage
H100	eu-north-4	10TB	Abundant onboard space, but backup critical data before termination
H100	uk-southeast-3	24TB	Abundant onboard space, but backup critical data before termination
H200	uk-central-3	24TB	High-performance onboard storage, export models before termination

Managing Storage Volumes

Check Your Onboard Storage

# View all storage available on your instance
df -h

# Check disk usage by directory
du -sh /*

# Monitor I/O performance
iostat -x 1

Your onboard storage is automatically available and includes:

System root volume (OS and applications)
Additional data volume (varies by configuration: 2TB - 24TB)

Manage Persistent Storage Volumes

If you’ve created persistent storage (us-central-1 only):

# List block devices to find your persistent volume
lsblk

# Persistent volumes appear as /dev/vd* devices
# Mount your persistent volume
sudo mkdir -p /mnt/persistent
sudo mount /dev/vdb /mnt/persistent

# Make mount persistent across reboots
echo "/dev/vdb /mnt/persistent ext4 defaults 0 2" | sudo tee -a /etc/fstab

Transfer Data Before Termination

Remember: Onboard storage is erased when the instance is terminated!

Before terminating an instance:

# Option 1: Copy to persistent storage (if available)
rsync -avP /home/ubuntu/important-data/ /mnt/persistent/backup/

# Option 2: Upload to S3
aws s3 sync /home/ubuntu/models/ s3://my-bucket/models/

# Option 3: Upload to Google Cloud Storage
gsutil -m cp -r /home/ubuntu/checkpoints/ gs://my-bucket/checkpoints/

# Option 4: Create tar archive and upload
tar -czf models.tar.gz /home/ubuntu/models/
curl -T models.tar.gz https://transfer.sh/models.tar.gz

Data Management Best Practices

Organizing Your Storage

Using Onboard Storage (All Instances):

# Onboard storage structure (size varies: 2TB - 24TB)
/home/ubuntu/           # User home directory
├── code/              # Your application code
├── data/              # Active datasets
├── models/            # Working models
└── outputs/           # Results and logs

/mnt/data/             # Additional onboard space (if available)
├── cache/             # Temporary files
├── checkpoints/       # Training checkpoints
└── scratch/           # Experimental work

Using Persistent Storage (When Available):

# Persistent volume (created separately, attached to instance)
/mnt/persistent/        # Survives instance termination
├── datasets/          # Reusable datasets
├── model-library/     # Trained models collection
├── checkpoints/       # Important checkpoints
└── shared-resources/  # Team shared data

Backup Strategies

Since onboard storage is erased on termination, implement appropriate backup strategies: For Instances with Persistent Storage (us-central-1):

# Automated backup from onboard to persistent storage
# Add to crontab: crontab -e
0 */2 * * * rsync -avP /home/ubuntu/models/ /mnt/persistent/models/
0 */4 * * * rsync -avP /home/ubuntu/checkpoints/ /mnt/persistent/checkpoints/

For Instances without Persistent Storage:

# Option 1: Backup to S3
aws s3 sync /home/ubuntu/models/ s3://my-bucket/models/ --delete

# Option 2: Backup to Google Cloud Storage
gsutil -m rsync -r /home/ubuntu/models/ gs://my-bucket/models/

# Option 3: Backup to Azure Blob Storage
az storage blob sync -s /home/ubuntu/models/ -c mycontainer

# Automate with cron (every 6 hours)
0 */6 * * * aws s3 sync /home/ubuntu/important/ s3://my-bucket/backup/

Optimizing Storage Performance

Storage Best Practices

Maximize Onboard Storage Performance:

Onboard NVMe provides up to 7,000 MB/s throughput
Use for active datasets and model training
Keep frequently accessed files on onboard storage
Clean temporary files regularly to maintain performance

Persistent Storage Optimization (if available):

Network-attached with up to 1,000 MB/s throughput
Best for long-term storage, not active training
Use for model archives and dataset libraries
Consider compression for infrequently accessed data

Managing Limited Storage (2TB configurations):

# Monitor disk usage closely
watch -n 60 'df -h | grep -v tmpfs'

# Clean package caches
pip cache purge
conda clean --all -y
apt-get clean

# Remove old Docker images if using containers
docker system prune -a -f

# Stream large datasets instead of downloading
# Example with TensorFlow:
dataset = tf.data.TFRecordDataset(["s3://bucket/data.tfrecord"])

Data Lifecycle Management:

# Set up automated cleanup for temporary files
find /home/ubuntu/cache -type f -mtime +1 -delete
find /tmp -type f -mtime +1 -delete

# Compress old checkpoints
find /home/ubuntu/checkpoints -name "*.ckpt" -mtime +7 -exec gzip {} \;

# Archive completed experiments
tar -czf experiment-$(date +%Y%m%d).tar.gz /home/ubuntu/experiments/completed/

Port Configuration

Configure network ports to enable access to services running on your GPU instances.

Exposing Services

SSH Port Forwarding

The most secure method for accessing services:

Jupyter Notebook
TensorBoard
Custom Service

# Local machine: Create SSH tunnel
ssh -L 8888:localhost:8888 ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

# On instance: Launch Jupyter
jupyter notebook --no-browser --port=8888

# Access at: http://localhost:8888

# Local machine: Create SSH tunnel
ssh -L 6006:localhost:6006 ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

# On instance: Launch TensorBoard
tensorboard --logdir=/mnt/ml-data/logs --port=6006

# Access at: http://localhost:6006

# Local machine: Forward custom port (e.g., 5000)
ssh -L 5000:localhost:5000 ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

# On instance: Run your service
python app.py --port=5000

# Access at: http://localhost:5000

Multiple Port Forwarding

# Forward multiple ports simultaneously
ssh -L 8888:localhost:8888 \
    -L 6006:localhost:6006 \
    -L 5000:localhost:5000 \
    ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

Advanced Networking

SOCKS Proxy Configuration

For full network access through your instance:

# Create SOCKS proxy
ssh -D 8080 ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

# Configure applications to use SOCKS proxy at localhost:8080

Persistent Tunnels

Use autossh for maintaining persistent connections:

# Install autossh
sudo apt-get install autossh

# Create persistent tunnel with auto-reconnect
autossh -M 0 -f -N \
  -o "ServerAliveInterval 30" \
  -o "ServerAliveCountMax 3" \
  -L 8888:localhost:8888 \
  ubuntu@[instance-ip] -i ~/.ssh/hyperbolic_key.pem

Security Considerations

Never expose services directly to the internet without proper authentication and encryption. Always use SSH tunnels for development and testing.

Best Practices

Use SSH tunnels for all development services
Implement authentication before exposing any service
Enable HTTPS for production deployments
Monitor access logs regularly
Rotate SSH keys periodically

# Monitor active connections
netstat -tulpn | grep LISTEN

# Check SSH connection attempts
sudo tail -f /var/log/auth.log | grep sshd

# List established connections
ss -tunap | grep ESTABLISHED

Storage and Port Automation

Monitoring and Alerts

Set up monitoring for both storage types:

#!/bin/bash
# storage-monitor.sh

echo "=== Storage Health Check ==="

# Check onboard storage
ONBOARD_USAGE=$(df -h /home/ubuntu | tail -1 | awk '{print $5}' | sed 's/%//')
ONBOARD_SIZE=$(df -h /home/ubuntu | tail -1 | awk '{print $2}')
echo "Onboard Storage: $ONBOARD_SIZE (${ONBOARD_USAGE}% used)"

# Determine alert threshold based on size
if [[ "$ONBOARD_SIZE" == *"2T"* ]]; then
    THRESHOLD=70  # Lower threshold for 2TB configs
else
    THRESHOLD=85  # Standard threshold for larger configs
fi

# Alert if over threshold
if [ $ONBOARD_USAGE -gt $THRESHOLD ]; then
    echo "⚠️  WARNING: Onboard storage ${ONBOARD_USAGE}% full (threshold: ${THRESHOLD}%)"
    echo "   → Clean temporary files: find /tmp -type f -mtime +1 -delete"
    echo "   → Clear package cache: pip cache purge && conda clean --all"
fi

# Check for persistent storage
if mountpoint -q /mnt/persistent 2>/dev/null; then
    PERSISTENT_USAGE=$(df -h /mnt/persistent | tail -1 | awk '{print $5}' | sed 's/%//')
    PERSISTENT_SIZE=$(df -h /mnt/persistent | tail -1 | awk '{print $2}')
    echo "Persistent Storage: $PERSISTENT_SIZE (${PERSISTENT_USAGE}% used)"
    echo "✓ Data on persistent storage survives termination"
else
    echo "⚠️  No persistent storage attached"
    echo "⚠️  ALL DATA WILL BE LOST ON INSTANCE TERMINATION!"
fi

# I/O performance tracking
echo -e "\n=== Storage Performance ==="
iostat -x 1 3 | tail -4 | head -3

# Backup status check
echo -e "\n=== Backup Status ==="
if crontab -l 2>/dev/null | grep -q rsync; then
    echo "✓ Automated backups are configured"
    crontab -l | grep rsync
else
    echo "⚠️  No automated backups configured"
    echo "   → Set up backups to persistent storage or external services"
fi

Troubleshooting

Common Storage Issues

Persistent Storage Not Mounting

Symptoms: Persistent volume not visible or mount failsSolutions:

# 1. Check if persistent volume is attached
lsblk
# Look for /dev/vdb or similar

# 2. Check if it has a filesystem
sudo file -s /dev/vdb

# 3. If "data" (no filesystem), format it (ONLY for new volumes!)
sudo mkfs.ext4 /dev/vdb

# 4. Create mount point and mount
sudo mkdir -p /mnt/persistent
sudo mount /dev/vdb /mnt/persistent

# 5. Fix permissions
sudo chown -R $USER:$USER /mnt/persistent

# 6. Make persistent across reboots
echo "/dev/vdb /mnt/persistent ext4 defaults 0 2" | sudo tee -a /etc/fstab

Note: Persistent storage must be created in the web console first, then attached to your instance.

Disk Space Running Low

Symptoms: Training fails, services crash, unable to save checkpointsSolutions:

# 1. Check what's using space
du -sh /* 2>/dev/null | sort -rh | head -20
df -h

# 2. Clean temporary files and caches
find /tmp -type f -mtime +1 -delete
find ~/cache -type f -mtime +7 -delete
pip cache purge
conda clean --all -y
apt-get clean

# 3. Compress old checkpoints
find ~/checkpoints -name "*.ckpt" -mtime +3 -exec gzip {} \;

# 4. If you have persistent storage, move data there
if mountpoint -q /mnt/persistent; then
    rsync -avP ~/models/ /mnt/persistent/models/
    rm -rf ~/models/old_versions/
fi

# 5. For limited storage (2TB), use external storage
# Upload to S3 and delete local copies
aws s3 sync ~/outputs/ s3://my-bucket/outputs/ --delete-removed

# 6. Remove Docker images if using containers
docker image prune -a -f
docker system prune -a -f --volumes

Prevention Tips:

Set up automated cleanup in cron
Use persistent storage for long-term data (if available)
Stream large datasets instead of downloading
Implement regular backups to external storage

Common Port Issues

Port Already in Use

Symptoms: Service fails to start on specified portSolutions:

# Find process using port
sudo lsof -i :8888

# Kill process if needed
sudo kill -9 [PID]

# Or use different port
jupyter notebook --port=8889

Cannot Access Service

Symptoms: Service running but not accessibleSolutions:

# Verify service is listening
netstat -tulpn | grep [PORT]

# Check SSH tunnel is active
ps aux | grep ssh

# Restart SSH tunnel
ssh -L [PORT]:localhost:[PORT] ubuntu@[instance-ip] -i ~/.ssh/key.pem

Getting Help

If you encounter issues with storage or port configuration:

Check the instance logs in the web console
Review the troubleshooting section above
Use the Intercom widget in the console for immediate assistance
Contact [email protected] with:
- Instance ID
- Error messages
- Steps to reproduce the issue

Next Steps

Troubleshooting Guide

Find solutions to common issues and error messages

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

Storage and Port Configuration

Storage and Port Configuration

Storage Options

Storage Types Overview

Working with Storage

Using Onboard Storage

Creating and Attaching Persistent Storage

Storage Configuration

Storage Planning by Workload

Managing Storage Volumes

Data Management Best Practices

Organizing Your Storage

Backup Strategies

Optimizing Storage Performance

Port Configuration

Exposing Services

SSH Port Forwarding

Multiple Port Forwarding

Advanced Networking

SOCKS Proxy Configuration

Persistent Tunnels

Security Considerations

Best Practices

Storage and Port Automation

Monitoring and Alerts

Troubleshooting

Common Storage Issues

Common Port Issues

Getting Help

Next Steps

Troubleshooting Guide

Overview

On-Demand GPU

Serverless Inference

Reserved Clusters

General Platform

​Storage and Port Configuration

​Storage Options

​Storage Types Overview

​Working with Storage

​Using Onboard Storage

​Creating and Attaching Persistent Storage

​Storage Configuration

​Storage Planning by Workload

​Managing Storage Volumes

​Data Management Best Practices

​Organizing Your Storage

​Backup Strategies

​Optimizing Storage Performance

​Port Configuration

​Exposing Services

​SSH Port Forwarding

​Multiple Port Forwarding

​Advanced Networking

​SOCKS Proxy Configuration

​Persistent Tunnels

​Security Considerations

​Best Practices

​Storage and Port Automation

​Monitoring and Alerts

​Troubleshooting

​Common Storage Issues

​Common Port Issues

​Getting Help

​Next Steps

Troubleshooting Guide

Storage and Port Configuration

Storage Options

Storage Types Overview

Working with Storage

Using Onboard Storage

Creating and Attaching Persistent Storage

Storage Configuration

Storage Planning by Workload

Managing Storage Volumes

Data Management Best Practices

Organizing Your Storage

Backup Strategies

Optimizing Storage Performance

Port Configuration

Exposing Services

SSH Port Forwarding

Multiple Port Forwarding

Advanced Networking

SOCKS Proxy Configuration

Persistent Tunnels

Security Considerations

Best Practices

Storage and Port Automation

Monitoring and Alerts

Troubleshooting

Common Storage Issues

Common Port Issues

Getting Help

Next Steps