SageMaker HyperPod task governance now supports Topology Aware Scheduling (TAS), enabling data scientists to schedule their large language model (LLM) tasks on an optimal network topology that minimizes network communication and enhances training efficiency.
LLM training and fine-tuning tasks that are distributed across multiple accelerated compute instances frequently exchange large volumes of data between them. Multiple network hops between instances can result in higher communication latency, impacting LLM task performance. SageMaker HyperPod task governance now enables data scientists to use network topology information when scheduling tasks with specific topology preferences. Using network topology in HyperPod, SageMaker HyperPod task governance automatically schedules tasks in optimal locations, reducing instance-to-instance communication and enhancing training efficiency.
SageMaker HyperPod task governance is available in all AWS Regions where HyperPod is available: US West (N. California), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm).
To learn more, visit SageMaker HyperPod webpage, and SageMaker HyperPod task governance documentation.
Categories:
Source: Amazon Web Services
Latest Posts
- Amazon RDS for PostgreSQL supports minor versions 17.6, 16.10, 15.14, 14.19, and 13.22
- Noise suppression for dial-in participants in Teams audio conferences [MC1135397]
- Microsoft Teams | Manage voice and face recognition for rooms (MTR-W/MTRA) via device settings [MC1135396]
- Reporting labels retirement in Teams admin center [MC1135399]