Skip to main content

Cluster Configuration

Configure your dedicated cluster with the exact specifications your workloads require. Our team works with you to design the optimal setup for your use case.

GPU Selection

Choose from the latest NVIDIA hardware:
GPU ModelMemoryBest For
Blackwell (B200)192GB HBM3eCutting-edge training and inference
H200141GB HBM3eNext-gen LLM training and inference
H10080GB HBM3Industry standard for large-scale training
Custom Mix: You can combine different GPU types in a single cluster for specialized workloads.

Networking Options

High-speed interconnects are critical for distributed training. Choose the right networking for your needs:

InfiniBand (400Gb/s)

Ultra-low latency networking for distributed training at scale.
  • Best for: Multi-node training with 32+ GPUs
  • Latency: Sub-microsecond
  • Topology: Fat-tree or custom

RoCE (200Gb/s)

High-speed Ethernet alternative with RDMA support.
  • Best for: Mixed training/inference workloads
  • Latency: Low microsecond
  • Easier integration with existing infrastructure
Direct GPU-to-GPU communication within a node.
  • Best for: Intra-node communication
  • Bandwidth: Up to 900GB/s
  • Included with multi-GPU nodes

Custom Topology

Need something specific? We can design custom network architectures for your requirements.

Compute Specifications

Configure the CPU and memory for your nodes:
ComponentOptions
CPUUp to 256 cores per node
RAMUp to 2TB per node
Internet Bandwidth10-100 Gbps connectivity

Storage Solutions

Local Storage

Fast NVMe storage attached directly to your nodes.
  • Capacity: Up to 30TB per node
  • Performance: Up to 7GB/s read/write
  • Best for: Training checkpoints, scratch space

Shared Storage

Network-attached storage accessible from all nodes.
TypeCapacityBest For
LustreUp to 1PBHigh-performance parallel I/O
NFSUp to 1PBGeneral-purpose shared storage

Object Storage

S3-compatible storage for datasets and artifacts.
  • Capacity: Unlimited
  • Best for: Large datasets, model artifacts, backups

Backup Options

  • Automated snapshots
  • Point-in-time recovery
  • Cross-region replication (optional)

Example Configurations

LLM Training Cluster

Optimized for training large language models:
ComponentSpecification
GPUs64x H100 80GB
NetworkingInfiniBand 400Gb/s
CPU128 cores per node
RAM1TB per node
Storage15TB NVMe local + 500TB Lustre shared
Contact sales for pricing based on your specific configuration and commitment terms.

Inference Cluster

Optimized for high-throughput inference:
ComponentSpecification
GPUs16x H100 80GB
NetworkingRoCE 200Gb/s
CPU64 cores per node
RAM512GB per node
Storage8TB NVMe local

Research Cluster

Flexible configuration for R&D teams:
ComponentSpecification
GPUs32x H200 141GB
NetworkingNVLink + RoCE
CPU128 cores per node
RAM1TB per node
Storage10TB NVMe local + 200TB NFS shared

Security Options

  • VPN Access: Secure connectivity to your cluster
  • Private Networking: Isolated network environment
  • Custom Firewall Rules: Control inbound/outbound traffic
  • Compliance Configurations: SOC2, HIPAA-ready setups available

Next Steps