Amazon SageMaker AI now supports inference recommendations, a new capability that eliminates manual optimization and benchmarking to deliver optimal inference performance. By delivering validated, optimal deployment configurations with performance metrics, SageMaker AI accelerates the path to production and keeps your model developers focused on building accurate models, not managing infrastructure.
Customers bring their own generative AI models, define expected traffic patterns, and specify a performance goal (optimize for cost, minimize latency, or maximize throughput). SageMaker AI then analyzes the model’s architecture and applies optimizations aligned to that goal across multiple instance types, benchmarking each configuration on real GPU infrastructure using NVIDIA AIPerf. By evaluating multiple instance types, customers can select the most price-performant option for their workload. The result is deployment-ready configurations with validated metrics including time to first token, inter-token latency, request latency percentiles, throughput, and cost projections.
The capability is available today in seven AWS Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Tokyo), Europe (Ireland), Asia Pacific (Singapore), and Europe (Frankfurt). To learn more, visit the SageMaker AI documentation.
Categories: marketing:marchitecture/artificial-intelligence,general:products/amazon-sagemaker
Source: Amazon Web Services
Latest Posts
- (Updated) New flexibility and choice for sharing organizational data across Microsoft 365 and Viva apps [MC1316232]
![(Updated) New flexibility and choice for sharing organizational data across Microsoft 365 and Viva apps [MC1316232] 2 pexels pixabay 269063](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- AWS announces AWS Blocks, an open-source framework for composing application backends on AWS (Preview)

- (Updated) Microsoft Edge moving to a 2-week release cycle starting with Edge 152 [MC1387532]
![(Updated) Microsoft Edge moving to a 2-week release cycle starting with Edge 152 [MC1387532] 4 pexels apasaric 6062555](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- Updates available for Microsoft 365 Apps for Current Channel [MC1393672]
![Updates available for Microsoft 365 Apps for Current Channel [MC1393672] 5 pexels evgeny tchebotarev 1058775 2541310](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)

![(Updated) New flexibility and choice for sharing organizational data across Microsoft 365 and Viva apps [MC1316232] 2 pexels pixabay 269063](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-pixabay-269063-150x150.webp)

![(Updated) Microsoft Edge moving to a 2-week release cycle starting with Edge 152 [MC1387532] 4 pexels apasaric 6062555](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-apasaric-6062555-150x150.webp)
![Updates available for Microsoft 365 Apps for Current Channel [MC1393672] 5 pexels evgeny tchebotarev 1058775 2541310](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-evgeny-tchebotarev-1058775-2541310-150x150.webp)
![(Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370] 7 (Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370]](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-steve-27594600-150x150.webp)