Amazon SageMaker lakehouse architecture now automates optimization configuration of Apache Iceberg tables

Amazon SageMaker lakehouse architecture now automates optimization configuration of Apache Iceberg tables

The Amazon SageMaker lakehouse architecture now automates optimization of Apache Iceberg tables stored in Amazon S3 with catalog-level configuration, reducing metadata overhead and improving query performance. Previously, optimizing Iceberg tables in AWS Glue Data Catalog required updating configurations for each table individually. Now, you can enable automatic optimization for new Iceberg tables with a one-time Data Catalog configuration. Once enabled, for any new table or updated table, Data Catalog continuously optimizes tables by compacting small files, removing snapshots, and unreferenced files that are no longer needed, resulting in controlled storage costs and faster queries.

You can get started by selecting the default catalog in the AWS Lake Formation console and enabling optimizations in the table optimizations configuration tab. You have the choice of additional granular control at the table configuration level, such as sort/z-order compaction strategy, thresholds for the number of small files to trigger compaction, intervals between consecutive snapshot expirations, and unreferenced data cleanup operations.

This feature is available through the AWS Management Console, AWS CLI, and AWS SDKs in 15 AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), Canada (Central), Europe (Ireland, London, Frankfurt, Stockholm), Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney, Jakarta), and South America (São Paulo). To learn more, read the blog, and visit the Data Catalog documentation.

Categories:

Source: Amazon Web Services



Latest Posts

Pass It On
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *