AWS Parallel Computing Service (PCS) now allows you to reboot compute nodes using Slurm commands without triggering instance replacement. With this feature, you can reboot nodes for operational reasons such as troubleshooting, resource cleanup, and recovery from degraded states before requiring full node replacement, enabling you to efficiently maintain cluster health at lower costs.
This feature is available in all AWS Regions where PCS is available. You can use the ‘scontrol reboot’ command with options to schedule immediate or deferred reboots, while reboots through other methods will continue to trigger instance replacement. To learn more, refer to Rebooting compute nodes with Slurm in AWS PCS.
PCS is a managed service that simplifies running and scaling high performance computing (HPC) workloads on AWS using Slurm. To learn more about PCS, refer to the service documentation.
Categories: marketing:marchitecture/management-tools,general:products/aws-govcloud-us,marketing:marchitecture/compute
Source: Amazon Web Services
Latest Posts
- Cloudflare Fundamentals – Fine-grained Permissioning for Access for Apps, IdPs, & Targets now in Public Beta
- Retirement of “When Sending a Message” Group Policy in Classic Outlook for Windows [MC1164375]
- OneDrive: Simplified file transfer for departing employees [MC1164381]
- (Updated) Microsoft Exchange Online: New limit for dynamic distribution groups (DDGs) [MC1163757]