Introducing the AWS Step Functions Data Science SDK for Amazon SageMaker

Posted on: Nov 7, 2019

The AWS Step Functions Data Science Software Development Kit (SDK) is an open-source library that allows you to easily create workflows that pre-process data and then train and publish machine learning models using Amazon SageMaker and AWS Step Functions. You can create machine learning workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately.

AWS Step Functions is a serverless orchestration service that allows you to build resilient workflows using AWS services such as Amazon SageMaker, AWS Glue, and AWS Lambda. Amazon SageMaker enables you to build, train and deploy machine learning models quickly. Now with the new Data Science SDK, you can easily build workflows, also known as pipelines, on AWS infrastructure using the preferred tools of data scientists - Python and Jupyter Notebooks.

You can use the Data Science SDK to create and visualize end-to-end data science workflows that perform tasks such as data pre-processing on AWS Glue and model training, hyperparameter tuning, and endpoint creation on Amazon Sagemaker. You can reuse the workflows in production by exporting AWS CloudFormation templates.

The Data Science SDK is included in AWS Step Functions pricing at no additional cost and is available in all regions where both AWS Step Functions and Amazon SageMaker are offered. The SDK can be used in conjunction with other services such as AWS Glue and AWS Lambda in their supported regions. For a complete list of regions and service offerings, see AWS Regions.

To get started with the AWS Step Functions Data Science SDK, download the Hello World notebook from GitHub, or open it from a notebook instance on Amazon SageMaker.  

To learn more: