Amazon SageMaker HyperPod now supports flexible instance groups

Amazon SageMaker HyperPod now supports flexible instance groups

Amazon SageMaker HyperPod now supports flexible instance groups, enabling customers to specify multiple instance types and multiple subnets within a single instance group. Customers running training and inference workloads on HyperPod often need to span multiple instance types and availability zones for capacity resilience, cost optimization, and subnet utilization, but previously had to create and manage a separate instance group for every instance type and availability zone combination, resulting in operational overhead across cluster configuration, scaling, patching, and monitoring.

With flexible instance groups, you can define an ordered list of instance types using the new InstanceRequirements parameter and provide multiple subnets across availability zones in a single instance group. HyperPod provisions instances using the highest-priority type first and automatically falls back to lower-priority types when capacity is unavailable, eliminating the need for customers to manually retry across individual instance groups. Training customers benefit from multi-subnet distribution within an availability zone to avoid subnet exhaustion. Inference customers scaling manually get automatic priority-based fallback across instance types without needing to retry each instance group individually, while those using Karpenter autoscaling can reference a single flexible instance group. Karpenter automatically detects supported instance types from the flexible instance group and provisions the optimal type and availability zone based on pod requirements. You can create flexible instance groups using the CreateCluster and UpdateCluster APIs, the AWS CLI, or the AWS Management Console.

Flexible instance groups are available for SageMaker HyperPod clusters using the EKS orchestrator in all AWS Regions where SageMaker HyperPod is supported. To learn more, see Flexible instance groups.

Categories: marketing:marchitecture/compute,general:products/aiml,marketing:marchitecture/artificial-intelligence

Source: Amazon Web Services



Latest Posts

Pass It On
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply