AWS Glue now provides the ability to rewind job bookmarks for your Spark ETL jobs

Posted on: Oct 22, 2019

Starting today, you can rewind your job bookmarks for your Glue Spark ETL jobs to any previous job run. AWS Glue tracks data that has been processed during a previous run of an ETL job by storing state information from the job run. This persisted state information is called a job bookmark.

Previously, you were only able to reset your job bookmarks which resulted in the subsequent job run reprocessing all of the data processed by previous job runs. You can now support data backfilling scenarios better by rewinding your job bookmarks to any previous job run, resulting in the subsequent job run reprocessing data only from the bookmarked job run.

This feature is now available in all the AWS regions where AWS Glue is available except AWS GovCloud (US-East) and AWS GovCloud (US-West).

To learn more about this feature, please visit our documentation