Amazon SageMaker HyperPod now enables you to manage individual cluster nodes directly from the AWS Console. HyperPod cluster operators managing large-scale AI/ML workloads often need to connect to nodes for troubleshooting, reboot unresponsive instances, or replace degraded nodes. Connecting to a node previously required manually constructing SSM connection strings, while node recovery actions such as reboot and replace required CLI commands — the console now provides a single interface for all node actions.
With node actions in the console, you can now connect to any node via AWS Systems Manager (SSM). The console provides pre-populated SSM CLI commands with copy-to-clipboard support, and direct SSM session launch in the console. While SageMaker HyperPod clusters already support automatic replacement and reboot of unhealthy instances, there are scenarios such as memory overruns or undetectable hardware degradation that may require manual intervention. Now, node actions in the console provide a consistent approach to manually reboot nodes to recover from transient issues, delete unhealthy nodes, and replace nodes, with batch operations supporting multiple node actions simultaneously, enabling you to resolve node issues in minutes. This capability is especially valuable when running time-sensitive AI training and inference workloads where minimizing downtime is essential.
This feature is available in all AWS Regions where Amazon SageMaker HyperPod is supported. You can perform all these node actions in the HyperPod Cluster management page on console. Click on the respective links to learn more about replace/reboot and connecting to a node.
Categories: marketing:marchitecture/artificial-intelligence
Source: Amazon Web Services
Latest Posts
- (Updated) High Volume Email for Microsoft 365 now generally available [MC1243552]
![(Updated) High Volume Email for Microsoft 365 now generally available [MC1243552] 2 pexels pixabay 36347](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- (Updated) Microsoft 365 Copilot for Teams: Bilingual consecutive interpretation mode with Interpreter agent [MC1239927]
![(Updated) Microsoft 365 Copilot for Teams: Bilingual consecutive interpretation mode with Interpreter agent [MC1239927] 3 pexels merlin 11167639](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- (Updated) Planner tab support for Shared and Private Channels in Microsoft Teams [MC1262590]
![(Updated) Planner tab support for Shared and Private Channels in Microsoft Teams [MC1262590] 4 pexels olly 3778966](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- (Updated) Copilot can edit your presentation in PowerPoint for the web [MC1219792]
![(Updated) Copilot can edit your presentation in PowerPoint for the web [MC1219792] 5 pexels steve 27424779](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)

![(Updated) High Volume Email for Microsoft 365 now generally available [MC1243552] 2 pexels pixabay 36347](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-pixabay-36347-150x150.webp)
![(Updated) Microsoft 365 Copilot for Teams: Bilingual consecutive interpretation mode with Interpreter agent [MC1239927] 3 pexels merlin 11167639](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-merlin-11167639-150x150.webp)
![(Updated) Planner tab support for Shared and Private Channels in Microsoft Teams [MC1262590] 4 pexels olly 3778966](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-olly-3778966-150x150.webp)
![(Updated) Copilot can edit your presentation in PowerPoint for the web [MC1219792] 5 pexels steve 27424779](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-steve-27424779-150x150.webp)
