Amazon SageMaker AI now supports EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency) speculative decoding to improve large language model inference throughput by up to 2.5x. This capability enables models to predict and validate multiple tokens simultaneously rather than one at a time, improving response times for AI applications.
As customers deploy AI applications to production, they need capabilities to serve models with low latency and high throughput to deliver responsive user experiences. Data scientists and ML engineers lack efficient methods to accelerate token generation without sacrificing output quality or requiring complex model re-architecture, making it hard to meet performance expectations under real-world traffic. Teams spend significant time optimizing infrastructure rather than improving their AI applications. With EAGLE speculative decoding, SageMaker AI enables customers to accelerate inference throughput by allowing models to generate and verify multiple tokens in parallel rather than one at a time, maintaining the same output quality while dramatically increasing throughput. SageMaker AI automatically selects between EAGLE 2 and EAGLE 3 based on your model architecture, and provides built-in optimization jobs that use either curated datasets or your own application data to train specialized prediction heads. You can then deploy optimized models through your existing SageMaker AI inference workflow without infrastructure changes, enabling you to deliver faster AI applications with predictable performance.
You can use EAGLE speculative decoding in the following AWS Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Tokyo), Europe (Ireland), Asia Pacific (Singapore), and Europe (Frankfurt)
To learn more about EAGLE speculative decoding, visit AWS News Blog here, and SageMaker AI documentation here.
Categories: general:products/amazon-sagemaker,marketing:marchitecture/artificial-intelligence
Source: Amazon Web Services
Latest Posts
- (Updated) Migration update for Office 365 connectors retirement in Teams – webhook URL support [MC1181996]
![(Updated) Migration update for Office 365 connectors retirement in Teams – webhook URL support [MC1181996] 2 pexels googledeepmind 17483906](data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==)
- AWS Backup adds cross-Region database snapshot copy to logically air-gapped vaults

- Amazon Bedrock AgentCore Browser now supports proxy configuration

- Announcing new Amazon EC2 general purpose M8azn instances


![(Updated) Migration update for Office 365 connectors retirement in Teams – webhook URL support [MC1181996] 2 pexels googledeepmind 17483906](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-googledeepmind-17483906-150x150.webp)



