Amazon SageMaker AI now supports inference recommendations, a new capability that eliminates manual optimization and benchmarking to deliver optimal inference performance. By delivering validated, optimal deployment configurations with performance metrics, SageMaker AI accelerates the path to production and keeps your model developers focused on building accurate models, not managing infrastructure.
Customers bring their own generative AI models, define expected traffic patterns, and specify a performance goal (optimize for cost, minimize latency, or maximize throughput). SageMaker AI then analyzes the model’s architecture and applies optimizations aligned to that goal across multiple instance types, benchmarking each configuration on real GPU infrastructure using NVIDIA AIPerf. By evaluating multiple instance types, customers can select the most price-performant option for their workload. The result is deployment-ready configurations with validated metrics including time to first token, inter-token latency, request latency percentiles, throughput, and cost projections.
The capability is available today in seven AWS Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Tokyo), Europe (Ireland), Asia Pacific (Singapore), and Europe (Frankfurt). To learn more, visit the SageMaker AI documentation.
Categories: marketing:marchitecture/artificial-intelligence,general:products/amazon-sagemaker
Source: Amazon Web Services
Latest Posts
- Power Pages – Postponed – Power Pages version 9.8.4.x Production Release [MC1309032]
![Power Pages - Postponed - Power Pages version 9.8.4.x Production Release [MC1309032] 2 pexels axel vandenhirtz 332204 929280](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- AWS Security Agent now supports full repository code reviews

- Amazon Connect Customer now supports embedding Cases and Customer Profiles in custom agent applications

- Collect Diagnostics change to Get Diagnostics for Outlook Mobile and Mac [MC1308855]
![Collect Diagnostics change to Get Diagnostics for Outlook Mobile and Mac [MC1308855] 5 pexels megan forbes 347998 963436](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)

![Power Pages - Postponed - Power Pages version 9.8.4.x Production Release [MC1309032] 2 pexels axel vandenhirtz 332204 929280](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-axel-vandenhirtz-332204-929280-150x150.webp)


![Collect Diagnostics change to Get Diagnostics for Outlook Mobile and Mac [MC1308855] 5 pexels megan forbes 347998 963436](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-megan-forbes-347998-963436-150x150.webp)
![(Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370] 7 (Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370]](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-steve-27594600-150x150.webp)