Amazon SageMaker AI now supports inference recommendations, a new capability that eliminates manual optimization and benchmarking to deliver optimal inference performance. By delivering validated, optimal deployment configurations with performance metrics, SageMaker AI accelerates the path to production and keeps your model developers focused on building accurate models, not managing infrastructure.
Customers bring their own generative AI models, define expected traffic patterns, and specify a performance goal (optimize for cost, minimize latency, or maximize throughput). SageMaker AI then analyzes the model’s architecture and applies optimizations aligned to that goal across multiple instance types, benchmarking each configuration on real GPU infrastructure using NVIDIA AIPerf. By evaluating multiple instance types, customers can select the most price-performant option for their workload. The result is deployment-ready configurations with validated metrics including time to first token, inter-token latency, request latency percentiles, throughput, and cost projections.
The capability is available today in seven AWS Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Tokyo), Europe (Ireland), Asia Pacific (Singapore), and Europe (Frankfurt). To learn more, visit the SageMaker AI documentation.
Categories: marketing:marchitecture/artificial-intelligence,general:products/amazon-sagemaker
Source: Amazon Web Services
Latest Posts
- Microsoft Teams: Let customers book appointments directly from your website using the Customer Connect widget [MC1288532]
![Microsoft Teams: Let customers book appointments directly from your website using the Customer Connect widget [MC1288532] 2 steel 4856024 1920](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- Microsoft Viva – Copilot Analytics: Export agent data from the Agent Dashboard [MC1288531]
![Microsoft Viva - Copilot Analytics: Export agent data from the Agent Dashboard [MC1288531] 3 pexels eye4dtail 216798](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- Microsoft Teams: Test microphone and speaker before joining a meeting [MC1288530]
![Microsoft Teams: Test microphone and speaker before joining a meeting [MC1288530] 4 pexels psco 137132](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- Microsoft Teams: Annotations on single window sharing for macOS [MC1288528]
![Microsoft Teams: Annotations on single window sharing for macOS [MC1288528] 5 pexels justin hamilton 16109 158918](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)

![Microsoft Teams: Let customers book appointments directly from your website using the Customer Connect widget [MC1288532] 2 steel 4856024 1920](https://mwpro.co.uk/wp-content/uploads/2025/06/steel-4856024_1920-150x150.webp)
![Microsoft Viva - Copilot Analytics: Export agent data from the Agent Dashboard [MC1288531] 3 pexels eye4dtail 216798](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-eye4dtail-216798-150x150.webp)
![Microsoft Teams: Test microphone and speaker before joining a meeting [MC1288530] 4 pexels psco 137132](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-psco-137132-150x150.webp)
![Microsoft Teams: Annotations on single window sharing for macOS [MC1288528] 5 pexels justin hamilton 16109 158918](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-justin-hamilton-16109-158918-150x150.webp)
![(Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370] 7 (Updated) Auto upgrade of shared calendars from legacy MAPI model to modern REST model [MC1287370]](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-steve-27594600-150x150.webp)