Amazon SageMaker AI’s Flexible Training Plans (FTP) now support inference endpoints, giving customers guaranteed GPU capacity for planned evaluations and production peaks. Now, customers can reserve the exact instance types they need and rely on SageMaker AI to bring up the inference endpoint automatically, without doing any infrastructure management themselves.
As customers plan their ML development cycles, they need confidence that the GPUs required for model evaluation and pre-production testing will be available on the exact dates they need them. FTP makes it easy for customers to access GPU capacity to run ML workloads. With FTP support for inference endpoints, you choose your preferred instance types, compute requirements, reservation length, and start date for your inference workload. When creating the endpoint, you simply reference the reservation ARN and SageMaker AI automatically provisions and runs the endpoint on that guaranteed capacity for the entire plan duration. This removes weeks of infrastructure management and scheduling effort, letting you run inference predictably while focusing your time on improving model performance.
Flexible Training Plans support for SageMaker AI Inference is available in following regions: US East (N. Virginia), US West (Oregon), US East (Ohio).
To learn more about using FTP reservations for inference endpoints, visit the SageMaker AI Inference API reference here.
Categories: marketing:marchitecture/artificial-intelligence,general:products/amazon-sagemaker,marketing:marchitecture/analytics
Source: Amazon Web Services
Latest Posts
- Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference

- Amazon S3 Metadata expands to 22 additional AWS Regions

- AWS announces support for Apache Iceberg V3 deletion vectors and row lineage

- Agents, Workers – Agents SDK v0.2.24 with resumable streaming, MCP improvements, and schedule fixes





