LogoLogo
WebsiteSlack
0.31
0.31
  • Get started
  • Clients
    • Install
    • CLI commands
    • Python API
    • Environments
    • Uninstall
  • Workloads
    • Realtime APIs
      • Example
      • Predictor
      • Configuration
      • Models
      • Parallelism
      • Server-side batching
      • Autoscaling
      • Statuses
      • Metrics
      • Multi-model
        • Example
        • Configuration
        • Caching
      • Traffic Splitter
        • Example
        • Configuration
      • Troubleshooting
    • Async APIs
      • Example
      • Predictor
      • Configuration
      • Statuses
      • Webhooks
      • Metrics
    • Batch APIs
      • Example
      • Predictor
      • Configuration
      • Jobs
      • Statuses
      • Metrics
    • Task APIs
      • Example
      • Definition
      • Configuration
      • Jobs
      • Statuses
      • Metrics
    • Dependencies
      • Example
      • Python packages
      • System packages
      • Custom images
    • Observability
      • Logging
      • Metrics
  • Clusters
    • AWS
      • Install
      • Update
      • Auth
      • Security
      • Multi-instance type
      • Spot instances
      • Networking
        • Custom domain
        • HTTPS (via API Gateway)
        • VPC peering
      • Setting up kubectl
      • Uninstall
    • GCP
      • Install
      • Credentials
      • Multi-instance type
      • Setting up kubectl
      • Uninstall
    • Private Docker registry
Powered by GitBook
On this page
  • Best practices
  • Example node groups
  • CPU spot/on-demand with GPU on-demand
  • CPU on-demand, GPU on-demand and Inferentia on-demand
  • 3 spot CPU node groups with 1 on-demand CPU
  1. Clusters
  2. AWS

Multi-instance type

The cluster can be configured to provision different instance types depending on what resources the APIs request. The multi instance type cluster has the following advantages over the single-instance type cluster:

  • Lower costs: Reduced overall compute costs by using the most economical instance for the given workloads.

  • Simpler logistics: Managing multiple clusters on your own is no longer required.

  • Multi-purpose cluster: The cluster can now take any range of workloads. One cluster for everything. Just throw a bunch of node groups in the cluster config, and you’re set.

Best practices

When specifying the node groups in your cluster.yaml config, keep in mind that node groups with lower indexes have a higher priority over the other ones. With that mind, the best practices that result from this are:

  1. Node groups with smaller instances should have the higher priority.

  2. Node groups with CPU-only instances should come before the node groups equipped with GPU/Inferentia instances.

  3. The spot node groups should always come first over the ones that have on-demand instances.

Example node groups

CPU spot/on-demand with GPU on-demand

# cluster.yaml

node_groups:
  - name: cpu-spot
    instance_type: m5.large
    spot: true
  - name: cpu
    instance_type: m5.large
  - name: gpu
    instance_type: g4dn.xlarge

CPU on-demand, GPU on-demand and Inferentia on-demand

# cluster.yaml

node_groups:
  - name: cpu
    instance_type: m5.large
  - name: gpu
    instance_type: g4dn.xlarge
  - name: inferentia
    instance_type: inf.xlarge

3 spot CPU node groups with 1 on-demand CPU

# cluster.yaml

node_groups:
  - name: cpu-0
    instance_type: t3.medium
    spot: true
  - name: cpu-1
    instance_type: m5.2xlarge
    spot: true
  - name: cpu-2
    instance_type: m5.8xlarge
    spot: true
  - name: cpu-3
    instance_type: m5.24xlarge

The above can also be achieved with the following config.

# cluster.yaml

node_groups:
  - name: cpu-0
    instance_type: t3.medium
    spot: true
    spot_config:
      instance_distribution: [m5.2xlarge, m5.8xlarge]
      max_price: 3.27
  - name: cpu-1
    instance_type: m5.24xlarge
PreviousSecurityNextSpot instances

Last updated 4 years ago