亚马逊AWS官方博客

基于 Tag 驱动的 EBS 类型优化 CloudFormation 模板

需求背景

企业中用于做报表的数据库服务器只是在每天的某个时间点需要大规模的磁盘IO操作,持续时间大约几个小时,如果按照高峰时期的磁盘IO需求来设定EBS卷的类型,需要IOPS为10000的io1卷。而按照平时大多数时候的磁盘IO水平来看,gp2类型的磁盘已经可以满足要求。为了能够做到物尽其用,节省成本,在IO高峰时候使用io1卷,而平时使用gp2卷会是一个理想的安排。

方案概述

为了能够简化实际操作过程的复杂度,方案采用通过定义EBS卷的Tag来触发对应的Lambda功能从而实现定期更改EBS卷的类型。

首先,用户需要启动CloudTrail, 并将CloudTrail与CloudWatch集成,本文假设CloudWatch Log Group的名字是CloudTrail/DefaultLogGroup。

我们将通过CloudTrail来捕捉用户对EBS卷的操作,如果用户创建、更改或者删除名字为ChangeEBSType的Tag,就会触发一个Lambda功能调用。判断触发条件是通过定义Log Group 的Filter实现的,Filter的定义如下:

  "SubscriptionFilter": {

      "Type": "AWS::Logs::SubscriptionFilter",

      "Properties": {

        "LogGroupName": {

          "Ref": "CloudTrailLogGroup"

        },

        "FilterPattern": "{($.requestParameters.tagSet.items[0].key = \"ChangeEBSType\") && ($.eventName = *Tags) && ($.requestParameters.tagSet.items[0].value!= \"\" ) && ($.requestParameters.resourcesSet.items[0].resourceId = \"vol*\")}",

        "DestinationArn": {

          "Fn::GetAtt": ["EBSChangeScheduler", "Arn"]

        }

      }

    }

 

当Filter的条件满足后,就会触发名字为EBSChangeScheduler的Lambda功能,这个Lambda程序将根据Tag的输入值,调用另一个CloudFormation模板以部署对应的CloudWatch Event Rules(规定什么时间对EBS卷的类型进行修改)和Lambda功能(change-ebs-type完成EBS卷的类型修改)。

 

整个方案的流程如下:

调用现有的CloudFormation模板并创建对应Stack的Lambda(EBSChangeScheduler.py)功能的示例代码如下:

import json

import zlib

import boto3

import botocore

import base64

import string


TemplateURL = ""


def get_cloudtrail_event(event):

    data = base64.b64decode(event['awslogs']['data'])

    data = zlib.decompress(data, 16 + zlib.MAX_WBITS)

    cloudtrail_event = json.loads(data)

    return cloudtrail_event


def get_message_from_cloudtrail_event(log_event):

    old_str = '\\"'

    new_str = '"'

    message = log_event['message']

    message = message.replace(old_str, new_str)

    return json.loads(message)


def create_cloudformation(stack_name, parameter1, parameter2, volume_id, client):

    print ("Create cloudformation stack: %s" % stack_name)

    try:

        response = client.create_stack(StackName=stack_name, TemplateURL=TemplateURL, Parameters=[

            {'ParameterKey': 'TargetEBSVolumeInfo', 'ParameterValue': parameter1}, {'ParameterKey': 'ScheduleExpression', 'ParameterValue': parameter2}, ], Capabilities=['CAPABILITY_IAM'])

    except Exception as ex:

        print ex.message


def update_cloudformation(stack_name, parameter1, parameter2, volume_id, client):

    print ("Update cloudformation stack: %s" % stack_name)

    try:

        response = client.update_stack(StackName=stack_name, UsePreviousTemplate=True, Parameters=[

            {'ParameterKey': 'TargetEBSVolumeInfo', 'ParameterValue': parameter1}, {'ParameterKey': 'ScheduleExpression', 'ParameterValue': parameter2}, ], Capabilities=['CAPABILITY_IAM'])

    except botocore.exceptions.ClientError as ex:

        error_message = ex.response['Error']['Message']

        if error_message == 'No updates are to be performed.':

            print("No changes")

        else:

            raise


def check_valid_stack(stack_name, client):

    try:

        response = client.describe_stacks()

    except Exception as ex:

        print ex.message

    for stack in response['Stacks']:

        if stack_name in stack['StackName']:

            return True


def build_ebs_volume_change_schedule(stack_name, target_schedule, volume_id, client):

    target_type = target_schedule.split(':')

    parameter1 = volume_id + ":" + target_type[0] + ":" + target_type[1]

    parameter2 = "cron" + target_type[2]

    print ("CloudForamtion template parameters:{},{}".format(

        parameter1, parameter2))

    print ("Volume %s will be changed to %s, IOPS is %s" %

           (volume_id, target_type[0], target_type[1]))

    print ("This task will be executed based on %s" % target_type[2])

    if check_valid_stack(stack_name, client):

        try:

            cloudformation = boto3.resource('cloudformation')

            try:

                stack = cloudformation.Stack(stack_name)

            except Exception as ex:

                print ex.message

            stack_status = stack.stack_status

            print ("Stack (%s) status: %s" % (stack_name, stack_status))

            if stack_status == "ROLLBACK_COMPLETE" or stack_status == "ROLLBACK_FAILED" or stack_status == "DELETE_FAILED":

                try:

                    response = client.delete_stack(StackName=stack_name)

                    waiter = client.get_waiter('stack_delete_complete')

                    waiter.wait(StackName=stack_name)

                except Exception as ex:

                    print ex.message

            if stack_status == "CREATE_IN_PROGRESS":

                waiter = client.get_waiter('stack_create_complete')

                waiter.wait(StackName=stack_name)

            if stack_status == "DELETE_IN_PROGRESS":

                waiter = client.get_waiter('stack_delete_complete')

                waiter.wait(StackName=stack_name)

            if stack_status == "UPDATE_IN_PROGRESS":

                waiter = client.get_waiter('stack_update_complete')

                waiter.wait(StackName=stack_name)

        except Exception as ex:

            print ex.message

    if check_valid_stack(stack_name, client):

        update_cloudformation(stack_name, parameter1, parameter2,

                              volume_id, client)

        waiter = client.get_waiter('stack_update_complete')

    else:

        create_cloudformation(stack_name, parameter1, parameter2,

                              volume_id, client)

        waiter = client.get_waiter('stack_create_complete')


def delete_ebs_volume_change_schedule(volume_id, client):

    response = client.describe_stacks()

    for stack in response['Stacks']:

        if volume_id in stack['StackName']:

            try:

                print("Delete cloudformation stack: %s" %

                      stack['StackName'])

                response = client.delete_stack(

                    StackName=stack['StackName'])

            except Exception as ex:

                print ex.message


def lambda_handler(event, context):

    volume_id = []

    global TemplateURL

    export = {}

    client = boto3.client('cloudformation')

    print (event)

    export = client.list_exports()

    for item in export['Exports']:

        if item['Name'] == 'CFUrl':

            TemplateURL = item['Value']

    print ("CF URL: %s" % TemplateURL)

    cloudtrail_event = get_cloudtrail_event(event)

    for log_event in cloudtrail_event['logEvents']:

        trail_message = get_message_from_cloudtrail_event(log_event)

        volume_id = trail_message['requestParameters']['resourcesSet']['items'][0]['resourceId']

        if trail_message['eventName'] == "CreateTags":

            for item in trail_message['requestParameters']['tagSet']['items']:

                if item['key'] == 'ChangeEBSType':

                    cf_parameter = item['value']

                    break

    if trail_message['eventName'] == "CreateTags":

        start_stop = cf_parameter.split(',')

        i = 0

        for schedule in start_stop:

            stack_name = "change-ebs-type-" + str(i) + "-" + volume_id

            build_ebs_volume_change_schedule(

                stack_name, schedule, volume_id, client)

            i = i + 1

    if trail_message['eventName'] == "DeleteTags":

        delete_ebs_volume_change_schedule(volume_id, client)

 特殊处理: 由于中国区的Lambda尚不支持环境变量,修改EBS卷的CloudFormation模板URL无法传给Python程序,所以利用了CloudFormation的Export功能,通过将URL变成Export的变量,在Python里面读取这个变量完成参数传递。 CloudFormation: "Outputs": { "CFPath": { "Description": "The URL of CF", "Value": { "Ref": "CFUrl" }, "Export": { "Name": "CFUrl" } } }

对应的Python语句:

global TemplateURL

    export = {}

    client = boto3.client('cloudformation')

    export = client.list_exports()

    for item in export['Exports']:

        if item['Name'] == 'CFUrl':

            TemplateURL = item['Value']

    print ("CF URL: %s" % TemplateURL)

 

EBS卷的Tag (ChangeEBSType)的格式有如下约定:

卷类型:IOPS值:变更起始计划, 卷类型:IOPS值:变更起始计划
例如如下设置:

io1:30000:(0 11 * * ? *),gp2:100:(0 19 * * ? *)

可以解释为:每天11点(UTC时间)将当前的卷变更为IOPS为30000的io1类型,同日19点(UTC时间)将当前卷恢复成gp2类型。(注意gp2类型的EBS卷忽略IOPS的值,所有IOPS值可以随意写,但不能为空值)

因为EBS卷的变更最小间隔时间为6小时,所以要确保6个小时内仅有一次磁盘类型的变更。

 

CloudWatch Event Rule可以通过如下CloudFormation的JSON语句创建:

"MyEventsRule": {

"Type": "AWS::Events::Rule",

"Properties": {

"Description": "Events Rule Invoke Lambda",

"Name": {

"Fn::Sub": "${AWS::StackName}-ChangeEBSEvent"

},

"ScheduleExpression": {

"Ref": "ScheduleExpression"

},

"State": "ENABLED",

"Targets": [{

"Arn": {

"Fn::GetAtt": [

"ModifyEbs",

"Arn"

]

},

"Id": "ModifyEbs"

}]

}

}

 

安装和运行

1. 在浏览器中输入:https://github.com/shaneliuyx/ChangeEBS, 下载:

ebs_change_scheduler_v2.json

ebs_change_scheduler_v2.zip

change_ebs_type.json

2.将json上载到S3 (上载URL假设为https://s3.cn-north-1.amazonaws.com.cn/shane/change_ebs_type.json

3.打开AWS控制台,并选择CloudFormation,选择存储在本地的json文件json,运行CloudFormation模板。

4.假设输入参数如下:

5. 选择Next,直至AWS资源开始创建

6. 最终运行结果如下

7.选择Volume ID为vol-0e3625c0f2f14e30d的EBS卷,创建新的tag,名称:ChangeEBSType,值:io1:30000:(0 12 * * ? *),gp2:100:(0 19 * * ? *)

8.我们可以在CloudTrail上查到如下记录:

9.几分钟后,我们可以在CloudFormation的控制台上看到2个新的Stack已经建立完成了:

10.同时检查CloudWatch控制台,选择Events-Rules,发现建立了2个新的Rules:

至此,我们就已经设置好了一个针对EBS卷的类型调度计划,此计划规定在1天中该EBS卷使用io1类型运行7个小时,使用gp2类型运行17个小时。在满足了服务器性能需求的同时,每天节省了17个小时的io1卷使用费用。

需要注意的是,由于磁盘类型转换的时间与磁盘的容量相关,在指定调度计划的时候一定要预估磁盘转换完成需要预留的时间,以免影响正常系统的使用效率。

 

 

本篇作者

刘育新

AWS专业服务部资深顾问,专注于企业客户的云迁移项目,长期从事IT基础设施的设计和实施工作。