Containers

How Amazon ECS manages CPU and memory resources

On August 19, 2019, we launched a new Amazon Elastic Container Service (Amazon ECS) feature that allows containers to configure available swap space on Linux. We want to take this opportunity to step back and talk more holistically how ECS resource management works (including the behavior this new feature has introduced). Specifically, we want to clarify how CPU and memory resources can be reserved and consumed by Linux containers deployed by ECS on different type of launch types: EC2 and AWS Fargate (Fargate). Windows containers, while supported on ECS, are outside of the scope of this blog post.

The ECS constructs

In ECS, the basic unit of a deployment is a task, a logical construct that models one or more containers. This means that the ECS APIs operate on tasks rather than individual containers. In ECS, you can’t run a container: rather, you run a task, which, in turns, run your container(s). A task contains (no pun intended) one or more containers.

The diagram below outlines the relationship between containers and tasks:

Diagram showing the relationship between containers and tasks in ECS. Four tasks are shown, labeled Task 1 through Task 4. Each task has an associated box with two containers inside the box."

The official ECS documentation has more information about these constructs.

The purpose of this blog post is to discuss what options you have (and what rules they have to cope with) in terms of resource management. In particular, we will discuss how CPU and Memory resources defined at the task level and container level relate to CPU and Memory resources available on EC2 and Fargate.

Introduction to the ECS launch types available

ECS has two different models for running your containers, which we call launch types. The first is to run ECS tasks on EC2 instances that you own, manage, and pay for, but you don’t have to pay per-task. The second is to use Fargate, a serverless environment for containers fully managed by AWS that enables customers to run containers without having to manage any underlying infrastructure. With Fargate, customers only get charged for tasks they run.

This diagram outlines visually what happens when you launch a task with the EC2 launch type:

Diagram showing interactions when launching an ECS task with the EC2 launch type. Three boxes are shown. The one in the middle contains the text "aws ecs run task ... --launch-type EC2 ..." and an arrow is drawn to the left-hand box, which is labeled "AWS" and contains a smaller box labeled "Amazon ECS". An arrow is drawn below the first one starting at the left-hand box and ending inside the right-hand box. The right-hand box is labeled "Cust Account" and contains several smaller boxes. One is labeled "Amazon EC2" and an arrow points to this box with the label "You have to manage this capacity (e.g. with ASGs)." Above the "Amazon EC2" box is a colored area with the labels "Service" and "Task", representing the Amazon ECS task that was launched.

This diagram outlines visually what happens when you launch a task with the Fargate launch type:

Diagram showing interactions when launching an ECS task with the Fargate launch type. Three boxes are shown. The one in the middle contains the text "aws ecs run-task ... -launch-type FARGATE ..." and an arrow is drawn to the left-hand box, which is labeled "AWS". The left-hand box contains smaller boxes labeled "Amazon ECS" and "AWS Fargate" as well as a colored area containing the labels "Service" and "Task" placed directly above the AWS Fargate box. An arrow is drawn from the Amazon ECS box to the colored area with "Service" and "Task" labels. An arrow is drawn from the "Task" label to the right-hand box labeled "Cust Account" and an element inside the box labeled "ENI".

One key aspect to keep into account in the context of this blog post is that “each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.” This means that, while with the EC2 launch type you can have more than one task running on top of the same Linux kernel and sharing kernel resources with each other, with the Fargate launch each task has a dedicated Linux kernel not sharing CPU, memory, or the Elastic Network Interface (ENI) with any other task. While a holistic comparison between the two different launch types is out of scope for this blog post, using either of these launch types have a number of ramifications in how CPU and memory resources can be reserved and consumed by containers.

General background on containers and how they access CPU and memory

Before we get into the weeds of how ECS works, let’s spend some time setting the stage regarding what to expect from default containers behavior in general.

There are two general rules of thumb with containers:

  • unless otherwise restricted and capped, a container that gets started on a given host (operating system) gets access to all the CPU and memory capacity available on that host.
  • unless otherwise protected and guaranteed, all containers running on a given host (operating system) share CPU, memory, and other resources in the same way that other processes running on that host share those resources.

These behaviors aren’t unique to containers, but are rather normal and expected for Linux processes in general. It’s worth emphasizing that, by default, containers behave like other Linux processes with respect to access to resources like CPU and memory.

ECS resource management options

We will learn throughout this blog post that ECS provides mechanisms to both restrict how much capacity…

  • … a task (and in turn, all of its containers) is entitled to consume on a given host as well as reserve how much capacity a task (and in turn, all of its containers) is expected to have available.
  • … a container is entitled to consume inside a task as well as reserve how much capacity a container is expected to have available.

For those of you that are new to ECS, below is an example of a task definition that includes a single container. You can think of the task definition as the ECS construct that defines the container(s) you intend to run. You will notice in the register-task-definition documentation that the task definition options are rich and broad. In this short example we have focused on the resource management aspect of how you can configure the task and containers and removed many other options. You will also notice that this example is only compatible with the EC2 launch type: this is because some of the CPU and memory configurations (namely maxSwap and swappiness) are not supported with the Fargate launch type.

{
    "family": "mywebsite", 
    "networkMode": "awsvpc",
    "cpu": "256", 
    "memory": "512", 
    "requiresCompatibilities": ["EC2"],  
    "containerDefinitions": [
        {
            "name": "mywebsite-nginx", 
            "image": "nginx:latest", 
            "essential": true,
            "cpu": 128,
            "memory": 256,
            "memoryReservation": 128,
            "linuxParameters": {
                "maxSwap": 512,
                "swappiness": 50
            }
        } 
    ]
}

 

You can register this task running the following command (where mywebsite.json is a file in the local directory that contains the JSON above):

aws ecs register-task-definition --cli-input-json file://mywebsite.json

This JSON structure above does the following:

  • it defines an ECS task called mywebsite that has a certain amount of CPU and memory capacity associated to it (it reserves 256 CPU units and 512MB of memory on the EC2 instance it is started on). If this was to be started with a Fargate launch type these values would be used to determine a properly sized Linux instance at run-task time.
  • it defines a single container within the task called mywebsite-nginx. This container has 128 CPU units and 256 MB of memory associated to it with a 128-MB reservation. In addition to this, the new feature we announced adds an additional configurable knob that allows us to configure a max swap size of 512 MB with an average aggressiveness (swappiness accepts an integer between 0 and 100).

This configuration is for demonstration purposes only and the values we have used do not have a specific meaning other than illustrating what options you have available. In the remainder of this blog post, we will describe how to use these options and parameters and what happens underneath when (if) you use them at the task and/or at the container level.

To describe the various options you have at your disposal when using these knobs, we will outline two scenarios:

  • Maximum level of flexibility: you can use this approach when your goal is to configure the fewest number of options, the highest level of resources over-commitment and the closest to the default container experience described above. You probably want to consider this approach for test/dev environments where you may want to optimize for costs instead of performance.
  • Maximum level of control: you can use this approach when your goal is to have the highest level of performance predictability and the highest level of fine-tuning in terms of resources capping and reservation both at the task and container level. You probably want to consider this approach for highly sensitive production environments where you may want to optimize for performance instead of cost.

Your specific use case may fall straight into one of these two extremes or it could land in the middle. For example, you may want to provide a certain amount of over-commitment for production workloads to be able to cope with sudden and short peaks in resources demand.

This blog post intends to give you all the primitives in order to tweak the task and container configurations for your specific use case.

The maximum level of flexibility

The picture below is intended to call out what is the maximum level of flexibility and minimal level of resource configurations for deploying tasks (and associated containers).

The most absolute flexible approach is to use tasks that do not have specific CPU and memory resource configuration.

In this case, these tasks:

  • can only be deployed on EC2 container instances (Fargate requires a specific CPU and memory resource configuration for the task)
  • must have containers that have at least either a soft or hard memory limit
  • does not need to have containers CPU resources configurations (the containers compete for full host CPU power)

Note: when you don’t specify any CPU units for a container, ECS intrinsically enforces two Linux CPU shares for the cgroup (which is the minimum allowed). Please note that “CPU units” is just an ECS construct and it doesn’t exist in the world of the Linux kernel. The way CPU units are enforced on the Linux host (via Linux CPU shares and cgroup configurations) is an implementation detail we are not diving into. Also, keep in mind that Linux doesn’t care how many CPUs you have or how many absolute shares you define. Linux just uses the sum of all defined CPU shares at a given location in the hierarchy as the denominator when determining the fraction of CPU resources available to a given cgroup during CPU contention.

A slightly less flexible approach is to use tasks that do have specific CPU and memory resource configuration. While this assumes some tasks resource configurations, it allows for an even greater configuration-free experience at the containers level.

In fact, in this case, these tasks:

  • can be deployed on EC2 container instances or Fargate
  • can have containers that do not have any memory (or CPU for that matter) limits configured at all

This is a graphical representation of the minimal configuration options:

Diagram showing possible configurations under different circumstances. At the top is a box labeled "Task" with a second box inside labeled "Container". Branching on either side are arrows labeled "Task without a size set" and "Task with a size set". For the arrow labeled "Task without a size set", the possible launch only includes a box labeled "Amazon ECS", describes container memory as required and describes container CPU, task memory, and task CPU as optional. For the arrow labeled "Task with a size set", the possible launch includes both "Amazon EC2" and "AWS Fargate", describing task memory and task CPU as required and container memory and container CPU as optional.

This is intended to illustrate the maximum level of resource usage flexibility (or the minimum level of configuration if you will) that an ECS user has to consider.

The reason the Fargate deployment requires task CPU and Memory configurations is because that is the billable construct for Fargate (you pay for the compute capacity required to run the task rather than paying for an EC2 instance). As we noted above, each task has a dedicated Linux kernel so the task size is also used to allocate the proper physical resources.

The most flexible approach (no task size) has the disadvantage that the user loses control over resources distribution (given all tasks and containers compete for the same resources on the host on EC2 container instances). The advantage of this approach however is that the user can implement an over-subscription strategy (on EC2 container instances) that can generate good savings in particular scenarios.

The other approach (explicit task size) is still somewhat flexible because it requires no single parameter at the containers level to be configured. However, while containers can still over-subscribe resources within the task, the tasks themselves cannot over-subscribe resources on the host. More on this in the next section.

The maximum level of CONTROL

ECS can also provide a lot more control around how compute resources (like CPU and Memory) are allocated to tasks and/or containers.

In this section we will explore how you can fine-tune these resources and the associated ramifications of doing so. For consistency with previous section, we will split this into a couple of different subsections:

  • what happens when you don’t set the task size (a configuration available for the EC2 launch type only)
  • what happens when you do set the task size (a configuration available for both the EC2 and the Fargate launch types
Task size not set Task size set
Can be used with the EC2 launch type Yes Yes
Can be used with the Fargate launch type No Yes

Note: while you could configure a task size for a dimension but not the other (for example, you configure the size of memory but not the size of CPU), for the sake of simplicity the discussion below assumes that either you set both or none. You could easily extrapolate what happens if you only set one dimension by merging the two scenarios below.

1 – Containers resources configurations with no task size configured

This is how ECS has historically worked. As a reminder, this configuration is NOT supported with the Fargate launch type (that is, you can’t use Fargate if you don’t configure the size of the task). As a result, this configuration is only available with ECS on EC2.

In this scenario, the task is just an unlimited logical boundary and everything resources-related is about the individual containers configuration (across all tasks) and the amount of physical capacity available at the EC2 container instance level.

As we hinted in the previous section, containers running in such task must have at least either a soft memory limit or a hard memory limit.

When you set the soft memory limit, you are essentially reserving memory capacity on the host your task lands on. It goes without saying that the sum of the soft memory of all containers in all the tasks running on a given EC2 instance, cannot exceed the physical memory available for that instance. Since you are reserving memory, all that memory (regardless of actual usage) must be there for the tasks (and the containers) to be able to start. Note this is just physical memory, not any additional swap space you may have configured through the new maxSwap option we recently announced. While the new feature allows the container to swap if configured properly, it does not include accounting for swap usage in the scheduling phase; swap is not treated as a resource for the ECS scheduler.

Note: if you are planning to use the new swap feature makes sure you enable it on the EC2 instances you are using. The ECS-optimized AMI does not have swap enabled by default. Please refer to the documentation for a full list of considerations to use this new feature.

When you set the hard memory limit you are setting an upper boundary for the container. You are essentially saying that the container cannot use more than that amount of memory (“if you will ever need memory, you won’t be able to go beyond this value anyway.”) Because of how the ECS scheduler works, tasks that do not have a memory size configured that include containers with hard memory limits that go beyond the amount of memory available on the host, will still be scheduled and started. The ramification of this behavior is that, effectively, the upper boundary memory limit at task run time becomes the memory available on the host rather than the hard limit set at task configuration time.

These are all the options you have available and what happens when you pick one of them:

  • if you only set a soft limit, that represents the reservation and the ceiling is represented by the container instance total memory
  • if you set the soft limit and the hard limit, the soft limit represents the reservation and the ceiling is represented by the hard limit you set.
  • if you only set the hard limit, that represents both the reservation and the ceiling

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

For CPU resources, the mechanism is slightly different. In ECS, CPU can be configured with units that are similar to how memory soft limit work. As a baseline ECS considers each vCPU available to an EC2 container instance as 1024 units. That is, an EC2 host that has 8 vCPUs has 8×1024=8192 units available. When you assign CPU units to a container, you are essentially reserving that much capacity on the host. It goes without saying that the sum of all CPU units across all containers across all tasks running on an EC2 instance cannot exceed the total number of units available for that host (8192 in the example above). As we have already mentioned above, these CPU units, in ECS parlance, get applied as Linux CPU shares to enforce the behavior. In other words, “units” is the nomenclature ECS uses to configure the behavior, while “shares” is the technology the Linux kernel uses to implement it. For most ECS users this is a detail they should not care about but it’s important to clarify it.

Having that said, there are a couple of things to consider here:

  • if containers are not using their allotted CPU units, other containers can use that capacity. When capacity is not used, any container can burst to use that spare capacity. CPU shares control the amount of CPU capacity available when there is CPU contention; that is, multiple containers attempting to use the CPU at the same time.
  • if there is spare capacity on the host (because the sum of all CPU units across all containers across all tasks is less than the available capacity on the host) the excess capacity on the host is re-partitioned proportionally to all of the containers if/when they need it

For example, imagine you are running a task on a host with 8192 CPU units. The task has two containers: containerA (to which you assigned 512 units) and containerB (to which you assigned 1024 units). When both containers are running at full steam and are consuming their reserved capacity, they can access the uncommitted CPU power (8192-512-1024=6656 CPU units). The way that this capacity is split is proportional to the units assigned to the containers (which you can think of as acting as weight): in particular containerB can get access to twice as many CPU units (2/3 of 6656= 4437 CPU units) vs containerA (which can get access to 1/3 of 6656 = 2219 CPU units).

2 – Containers resources configurations with a task size explicitly configured

This configuration is supported with both the EC2 launch type and the Fargate launch type.

In this particular scenario, the task becomes itself a solid boundary around the container(s) that can run inside that task.

In this scenario, the containers, running in this task configuration, are only able to use the capacity defined by the task size, meaning they see the task as being their boundaries. To be correct, containers still see the total capacity because they can read /proc, but that total capacity is not usable for them.

From a memory management perspective, the important difference is that containers do not need to have any type of memory limit configured. In this case, they all compete for the amount of memory available at the task level. This is unless you tune memory resources at the container level, which is probably something you may want or need to do in a scenario where you need full control of resources all the way down to the single container. This is definitely possible with ECS.

If you configure limits at the containers level, the sum of the memory soft limits of all containers running inside this specific task cannot exceed the memory size of the task. A notable difference between tasks with no memory size and tasks with memory size is that, in the latter scenario, none of the containers can have a memory hard limit that exceeds the memory size of the task (however the sum of all hard limits can exceed it yet the sum of all soft limits cannot exceed it). This is different from the former scenario (tasks with no memory size) where the scheduler can schedule them on EC2 hosts with less memory capacity.

These are all the options you have available and what happens when you pick one of them:

  • if you only set a soft limit, that represents the reservation and the ceiling is represented by the task memory size
  • if you set the soft limit and the hard limit, the soft limit represents the reservation and the ceiling is represented by the hard limit you set.
  • if you only set the hard limit, that represents both the reservation and the ceiling

Even from a CPU resource management perspective the algorithms are similar to what we have discussed in the previous section with the notable difference that now the total budget of CPU units is no longer the EC2 host in its entirety but rather the task size you configured. So if you configure the task size to have 1024 CPU units, the sum of the CPU units of all of your containers running in that task cannot exceed 1024 and the non budgeted capacity among those 1024 is re-distributed across all containers proportionally to the budget they already have.

It is also important to understand that while ECS refers to CPU units as the amount of CPU capacity you assign to tasks as well as to containers, the way that they are enforced is different. CPU units assigned to tasks are enforced by the Linux kernel as a hard CPU ceiling for the task (in fact containers running in this task cannot exceed the quota assigned to the task). On the other hand, CPU units assigned to containers are implemented using Linux CPU shares within the task (which is more of a weighted mechanism to determine CPU access priority).

Note: at the time of this writing, the CPU units allocation at the task level cannot exceed 10 vCPUs (or 10240 CPU units). This means that if you want a container to use more than 10 vCPUs you should not set a (CPU) task size.

Making sense of all the options

This table is an attempt to summarize the characteristics and behavior of both tasks and containers in the two scenarios we have outlined above: tasks with a specific size not set and tasks with a specific size set.

Task size not set Task size set
Task behavior
Can be used with the EC2 launch type Yes Yes
Can be used with the Fargate launch type No Yes
Task has reserved resources No Yes (the task size)
Task has capped resources No (limited to host size) Yes (limited to task size)
Container configuration options (inside the tasks)
CPU capacity a container can use (if free) Host CPU capacity Task size
Memory a container can use (if free and unless limited) Host memory Task size
Container needs to have a CPU value configured No No
Container needs to have a memory value configured Yes (either soft or hard) No

Applications scheduling and performance

So far we have doubled-down on explaining the technical ramifications of tweaking the knobs of the various options available. We should not however lose sight of the fact that everything we do is to provide the best application performance experience at the right cost for the organization. Performance and cost are two pillars of our Well-Architected Framework.

Applications require CPU and memory resources to run and everything we have discussed so far describes a way to fulfill those resources to containerized applications running on ECS. The parameters we have discussed also give ECS a means by which it can make scheduling decisions.

When using the Fargate launch type, this scheduling mechanic is minimized and simplified because each task is scheduled on a dedicated Linux kernel that has (at least) the same capacity configured for the ECS task. Also, given that memory swapping capability is not available with the Fargate launch type, ultimately an application running inside a Fargate task has CPU and memory capacity precisely as defined at the task level.

When using the EC2 launch type, the scheduling mechanic is exercised in its fullness by ECS and the best way to describe it is by imagining that this process is similar to how Tetris works: a number of tasks come in into the ECS scheduling funnel and, based on their characteristics, they get scheduled to run on EC2 instances according to the “slots” (of CPU and memory) they have available to land these “bricks.”

Specifically, the shape and size of every brick is determined by the reservations that are associated to the task and/or the containers within it. If you think about it, reservations are effectively telling ECS that “this specific task needs this much memory and this much CPU” and ECS must find enough reservable capacity on one EC2 instance across an ECS cluster to be able to satisfy the requirement. As we have outlined in the rest of the post above, the task CPU size, the task memory size and the container memory soft limits all contribute to form the shape and overall size of the “brick.” Container memory hard limits and the container CPU size do not play a role in shaping and sizing the brick. They only either express a ceiling or a priority when there is contention but do not set strong resource requirements that must be satisfied (like reservations do).

Ultimately, these knobs plus the recently announced swapping feature allow users to shape their bricks (or tasks) in a way that can serve their application requirements better:

  • If you need your application to have ultimate and predictable performance use the available knobs to define a proper shape and size for your task and have the ECS scheduler figure out which EC2 instance can accommodate that brick with a specific shape and size. Chances are that the task can’t be scheduled if enough capacity is not found by the scheduler (this may trigger an autoscale event, if properly configured).
  • If you need your application to get scheduled, you are not worried about predictable performance or you are trying to optimize for cost, then try to make your brick shape and size as small as possible and work with ceilings and shares to prioritize them. What you are effectively doing here is over-subscribing EC2 resources in a way that makes it easy for ECS to allocate these bricks on EC2 instances slots. Chances are that these tasks may be competing for resources on that EC2 host though, simply because you reserved less than what the application is going to actually use and request. In this case, there are increased chances that your application may experience poor performance or possibly get killed by the EC2 kernel due to Out of Memory events. If this happens, the swapping feature may mitigate this critical event by using disk as a memory area. At this point, your application may increasingly slow down but it is not killed if there is enough virtual memory available.

In the final analysis, all these configuration options allow you to do is to optimize your application deployments for either performance or for cost (or a mix of those). ECS customers using an EC2 launch type maintain responsibility for finding the deployment strategy that better fits their organizational needs.

[Update 12/11/2020] Some applications are container-aware and can configure themselves to take full advantage of the resources available inside the container. The latest versions of Java are a good example of this pattern. For this reason, some of these applications work best when CPU and memory resources are explicitly configured at the container level, in addition to the configuration at the task level. In other words, while containers without specific resource configurations can nominally access all task resources, some applications will interpret this configuration differently and may artificially limit the amount of resources they think are available to the container.

Conclusions

In this blog post, we have taken the opportunity to summarize how resource management (specifically for CPU and memory) works with ECS with all the launch types available to date (EC2 and Fargate). The introduction of new features in this context is a testament of our investment in ECS and the commitment to the customers that we listen.

We have explored some of the ECS basics, we outlined some of the basic concepts of how containers use CPU and memory resources on a Linux host and then we started to zoom in into the ECS-specific approaches to configure these resources.

We have introduced a couple of different configuration scenarios, one that is more oriented towards test and development use cases (where flexibility and over-commitment of resources are important) and another one that is more targeted to production workloads (where performance and predictability are key aspects).

Per each of the two scenarios, given the numerous configuration options and alternatives, we have focused on tasks configurations and then zoomed in into containers configurations options. All in all, the layout of the document is such that should have given most of the low-level information for the user to implement a proper ad hoc tasks and containers configuration strategy for his/her specific use case and requirements

The key technical takeaways are as follows.

The deployment of ECS tasks on top of EC2 instances provides the greatest level of flexibility in that:

  • you can deploy both tasks with a specific size as well as tasks without a specific size (or a mix of those)
  • you can over-commit resources because containers across different tasks can share the same physical resources
  • containers in any task (that have a size set and hence a ceiling limit) cannot burst capacity usage beyond this task ceiling
  • containers without explicit limits in any task without size set (and hence without a ceiling limit) can burst capacity usage stealing resources from other tasks. This is true even if these other tasks have a size set, assuming these tasks are not utilizing that capacity
  • you can deploy very small tasks (that can burst) either with no size or with a very small size (check this link for additional information on how to size a task)

The deployment of ECS tasks on top of Fargate provides slightly less flexibility because:

  • you are deploying to a task of a given size (which maps 1:1 an EC2 instance of that capacity)
  • you cannot over-commit resources across different tasks because they have a specific size set and run on a dedicated Linux kernel (however you can still over-commit resources inside the task among containers)
  • you cannot create tasks that are smaller than 1/4th of a vCPU and 512 MB of memory

It goes without saying that the value proposition of Fargate goes well beyond the mere resource management options. Fargate is a serverless environment for containers that allows customers to focus on business needs instead of investing time to work on undifferentiated heavy lifting of managing compute capacity. In addition to this using Fargate allows you to dedicate a Linux kernel to your task, which raises the bar of your security posture.

Massimo Re Ferre

Massimo Re Ferre

Massimo is a Senior Principal Technologist at AWS. He has been working on containers since 2014 and is now part of the DECS (Developers, Events, Containers, Serverless) organization at AWS. Massimo has a blog at https://it20.info and his Twitter handle is @mreferre.

Samuel Karp

Samuel Karp

Samuel Karp is a Senior Software Development Engineer working on container infrastructure including the Bottlerocket OS, containerd, and Firecracker.