Amazon CloudWatch Container Insights now supports Neuron UltraServers on Amazon EKS, providing enhanced observability for customers running large-scale, high-performance machine learning workloads on multi-instance nodes. This new capability enables data scientists and ML engineers to efficiently monitor and troubleshoot their containerized ML applications, offering aggregated metrics and simplified management across Neuron UltraServer groups.
Neuron UltraServers combine multiple EC2 instances into a single logical server unit, optimized for machine learning workloads using AWS Trainium and Inferentia accelerators. Container Insights, a monitoring and diagnostics feature in Amazon CloudWatch, automatically collects metrics from containerized applications. With this launch, Container Insights introduces a new filter specifically for UltraServers in EKS environments. You can now select an UltraServer ID to view new aggregate metrics across all instances within that server, replacing the need to monitor individual instances separately. In addition to per-instance metrics, you can now view consolidated performance data for the entire UltraServer group, streamlining the monitoring of ML workloads running on AWS Neuron.
Amazon CloudWatch Container Insights is available in all commercial AWS Regions, and the AWS GovCloud (US).
To get started, see AWS Neuron metrics for AWS Trainium and AWS Inferentia in the Amazon CloudWatch User Guide
Categories: marketing:marchitecture/containers,general:products/amazon-eks,general:products/aws-govcloud-us,marketing:marchitecture/management-and-governance,general:products/amazon-cloudwatch
Source: Amazon Web Services
Latest Posts
- Amazon CloudWatch Container Insights now supports Neuron UltraServers on Amazon EKS

- Amazon API Gateway REST APIs now supports private integration with Application Load Balancer

- Amazon Bedrock Data Automation now supports synchronous image processing

- EC2 Image Builder now supports auto-versioning and enhances Infrastructure as Code experience





