AWS Parallel Computing Service (PCS) now supports node reboot via Slurm

AWS Parallel Computing Service (PCS) now supports node reboot via Slurm

AWS Parallel Computing Service (PCS) now allows you to reboot compute nodes using Slurm commands without triggering instance replacement. With this feature, you can reboot nodes for operational reasons such as troubleshooting, resource cleanup, and recovery from degraded states before requiring full node replacement, enabling you to efficiently maintain cluster health at lower costs.

This feature is available in all AWS Regions where PCS is available. You can use the ‘scontrol reboot’ command with options to schedule immediate or deferred reboots, while reboots through other methods will continue to trigger instance replacement. To learn more, refer to Rebooting compute nodes with Slurm in AWS PCS.

PCS is a managed service that simplifies running and scaling high performance computing (HPC) workloads on AWS using Slurm. To learn more about PCS, refer to the service documentation.

Categories: marketing:marchitecture/management-tools,general:products/aws-govcloud-us,marketing:marchitecture/compute

Source: Amazon Web Services



Latest Posts

Pass It On
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply

    Your email address will not be published. Required fields are marked *