Amazon SageMaker Inference now supports container image caching, enabling up to 2x faster end-to-end scaling for generative AI models during scale-out events. When your endpoint scales out, the service pre-caches your container image so new instances can start serving traffic faster, without waiting for large container images to be pulled from Amazon ECR.
Generative AI workloads typically use large container images (10 GB or more) for deep learning frameworks and model serving. Previously, every new instance launched during scale-out had to pull the full image from ECR, adding several minutes of cold-start latency. Container image caching eliminates this bottleneck by pre-pulling the image so new instances launch with the container already available locally. Customers don’t need to make any changes. The service automatically caches whatever image URI is specified in your endpoint or inference component configuration. This capability supports accelerator instance types, single-model endpoints, and inference component-based endpoints.
With this launch, SageMaker Inference now offers a comprehensive scaling optimization suite for generative AI: sub-minute concurrency metrics for up to 6x faster load detection, instance-store container caching for faster scaling on existing instances, and container image caching for up to 2x faster scaling on new instances.
Container image caching is available in all AWS commercial regions where SageMaker Inference is supported. To learn more, visit the launch blog.
Categories: marketing:marchitecture/artificial-intelligence,general:products/amazon-sagemaker
Source: Amazon Web Services
Latest Posts
- IAM Identity Center now enables programmatic AWS account access for customer managed applications

- AWS End User Messaging RCS now supports rich media and interactive messaging

- Amazon SageMaker AI now supports serverless model customization for Gemma 4 models

- Amazon Time Sync Service adds support for Microsecond accurate time on 26 additional EC2 instance types in all commercial regions






