SageMaker HyperPod now supports Topology Aware Scheduling of LLM tasks

SageMaker HyperPod now supports Topology Aware Scheduling of LLM tasks

SageMaker HyperPod task governance now supports Topology Aware Scheduling (TAS), enabling data scientists to schedule their large language model (LLM) tasks on an optimal network topology that minimizes network communication and enhances training efficiency.

LLM training and fine-tuning tasks that are distributed across multiple accelerated compute instances frequently exchange large volumes of data between them. Multiple network hops between instances can result in higher communication latency, impacting LLM task performance. SageMaker HyperPod task governance now enables data scientists to use network topology information when scheduling tasks with specific topology preferences. Using network topology in HyperPod, SageMaker HyperPod task governance automatically schedules tasks in optimal locations, reducing instance-to-instance communication and enhancing training efficiency.

SageMaker HyperPod task governance is available in all AWS Regions where HyperPod is available: US West (N. California), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm).

To learn more, visit SageMaker HyperPod webpage, and SageMaker HyperPod task governance documentation.

Categories:

Source: Amazon Web Services



Latest Posts

Pass It On
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *