Amazon SageMaker HyperPod now supports EFA-only network interfaces for cluster instance groups, enabling you to configure dedicated Elastic Fabric Adapter (EFA) devices without the traditional Elastic Network Adapter (ENA) for IP networking. SageMaker HyperPod is a purpose-built infrastructure for AI/ML model development that provides a resilient, high-performance environment with built-in fault tolerance and automated cluster recovery. Now with EFA-only, you can scale AI/ML clusters further without risking IP address exhaustion in your VPC.
When running large-scale distributed training workloads, inter-node communication bandwidth is critical to training performance. SageMaker HyperPod cluster instances support multiple EFA-capable network interfaces, but configuring them with the standard efa interface type attaches both an EFA device and an ENA device (for IP networking) to each interface — even when IP networking is only needed on a subset of interfaces within a node. The efa interface type inescapably consumes IP addresses in your subnet for each ENA device attached, which can lead to IP address exhaustion and limit the number of nodes you can deploy within a single subnet. With this launch, you can now set efa-only when configuring network interfaces for your HyperPod cluster instance groups. This option allocates the network interface exclusively for EFA traffic without attaching an ENA device, allowing you to maximize the number of EFA interfaces dedicated to low-latency, high-throughput inter-node communication. Because EFA-only interfaces do not require IP addresses, you can scale to larger clusters within the same subnets without encountering IP exhaustion. This configuration is particularly beneficial for large-scale distributed training jobs where inter-node communication bandwidth is critical and dedicated IP networking on every interface is not required.
To enable EFA-only, specify efa-only in the ClusterNetworkInterface configuration when creating or updating your HyperPod cluster via the CreateCluster/UpdateCluster API. EFA-only is available in all AWS Regions where Amazon SageMaker HyperPod is supported. To learn more, see ClusterNetworkInterface in the Amazon SageMaker API Reference.
Categories: marketing:marchitecture/artificial-intelligence
Source: Amazon Web Services
Latest Posts
- Amazon SageMaker HyperPod now supports EFA-only network interfaces

- Amazon SES now supports tenant-level suppression lists

- (Updated) Microsoft Teams: Automatically set work location by connecting to a Wi-Fi network [MC1081568]
![(Updated) Microsoft Teams: Automatically set work location by connecting to a Wi-Fi network [MC1081568] 4 pexels pixabay 33158](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- (Updated) SharePoint Online: Storage quota enforcement updated to align with license limits [MC1310684]
![(Updated) SharePoint Online: Storage quota enforcement updated to align with license limits [MC1310684] 5 pexels david bartus 43782 1166209](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)



![(Updated) Microsoft Teams: Automatically set work location by connecting to a Wi-Fi network [MC1081568] 4 pexels pixabay 33158](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-pixabay-33158-150x150.webp)
![(Updated) SharePoint Online: Storage quota enforcement updated to align with license limits [MC1310684] 5 pexels david bartus 43782 1166209](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-david-bartus-43782-1166209-150x150.webp)