Why is Reserved Instances Normalization Changing My Cloud Metrics?

Engineering Why is reserved instances normalization changing my cloud metrics?

Knowing normalization’s effect before you normalize.

Intuit embraces a perpetual state of disruption to parallel today’s global environment. It disrupts its own conventions at every turn, and its infrastructure model is not immune to this controlled chaos.

Popular applications known best by consumers – Turbo Tax, QuickBooks Online – are hosted on that internal platform. These applications are now in the process of migrating to the AWS public cloud. The cloud promises scalability, speed, reliability…but what about cost? Surely, all those game-changing features come with a cost.

With its cloud journey underway, Intuit began considering how to best optimize costs. One way to do so was to use reserved instances rather than on-demand for compute infrastructure. The idea behind reserved instances is if you have long-term demand for compute infrastructure you can pay a reservation fee upfront which reserves your right to use compute infrastructure for a certain period of time for a discounted cost. On-Demand instances cost more per hour, but require no long-term commitment.

Therefore, Intuit began looking closely at its usage to determine where it was appropriate to buy the rights to long-term cloud usage. It turns out there was a lot of opportunity to do so – to the tune of millions of dollars paid upfront which also provided millions of dollars in savings over the long-term. But, that is not the end of the story.

The cloud is in a perpetual state of change, and part of that change was recognizing the demand for adjusting compute size over time – including reserved instances. This, of course, necessitated the creation of RI Modification which allowed cloud users to use different instance sizes over time; e.g. if you bought a “large” RI, but you wanted to modify your infrastructure, you could modify that “large” RI into two “medium” instances.

 

Previously, if you bought a Reserved Instance, the tech specs were static throughout the life of the RI. Amazon has improved upon this model for companies who want both the discount for long-term usage as well as flexibility when compute demand invariably changes.

Modifying your RIs is now possible. Changing availability zones, scope, network platforms and instance sizes is possible albeit with a few restrictions. I am going to focus on the ability to change instance sizes because tracking these changes over time requires a new perspective on your RI Coverage metric. In brief, RI Coverage is the total RI footprint divided by your total cloud footprint. If you have 20 total instances and 10 are covered by RI, then the RI Coverage metric would be 50%.

The advent of size flexible regional reserved instances necessitates a modification to reserved instance reporting on metrics like reserved instance coverage. Tracking usage on fluidly changing reserved instance types would be a Sisyphean task if Amazon did not provide a methodology to systematically account for this behavior. Fortunately, the key to this methodology is  “unit normalization” which consists of mapping each instance type to a factor and is generally based on size and storage of the instance. For example, unit normalization allows you to modify a large instance into two medium instances while maintaining overall consistency in your compute footprint.

Below is a small table to give you an idea of how the normalization factors are mapped.

Instance size Normalization factor would be
nano .25
micro .5
small 1
medium 2
large 4
xlarge 8
2xlarge 16
4xlarge 32
8xlarge 64
16xlarge 128
32xlarge 256

To show tracking from one instance size to another:

1 xlarge (8 factor) = 2 larges (4 factor) = 4 mediums (2 factor) = 8 smalls (1 factor) is the same as

8 = 8 = 8 = 8

 

This methodology allows you to track your RI footprint regardless of how many times you adjust instance sizes in the cloud.

Depending on the assortment of instance types (as of this writing there are nearly 60 current instance types) in your infrastructure and the respective usage of reserved instances, your reserved instance coverage metric could vary from previous reporting. It is paramount to arm your stakeholders with a clear understanding of why reserve instance coverage metrics would change after normalization has been applied.

Larger reserved instances carry a greater weight after normalization. An 8xlarge instance has a normalization factor of 64, so singularly it carries 32x greater weight than a medium instance (normalization factor of 2) in the metric calculation once normalization has been applied. The normalization factor’s weight on each respective instance size will cause a shift in your RI Coverage metric if your cloud infrastructure has differing RI Coverage when grouped by instance size.

For example:

Two mediums (covered) + Two 8xlarges (1 covered and 1 on-demand) = 4 total instances (or instance hours)

AND

3 (covered) / 4 (total) = 75% RI Coverage

 

Apply normalization:

Two mediums with a normalization factor of 2 = 2 * 2 = 4 units

Two 8xlarges with a normalization factor of 64 = 2 * 64 = 128 units

 

Same example, but with normalization applied:

4 + 128 = 132 total units

68 (covered) / 132(total) = 52% RI Coverage (considerably less than pre-normalization)

 

RI Coverage is lower in the unit-based example

Normalization’s weighting effect:

Instance hour-based example

Weight of mediums = 2 / 4 = 50%

Weight of 8xlarges   = 2 / 4 = 50%

Unit-based example

Weight of mediums = 4 / 132 = 3% — weighting decreased substantially

Weight of 8xlarges   = 128 / 132 = 97% — weighting increased substantially

 

Weight of mediums                                                                                  Weight of 8xlarges

                                       

 

 

RI Coverage Metric by Instance Type paired with Hour and Unit-based Weights

 

We demonstrably see that normalization can manifest significant changes to analytics. The example’s sharp decline from 75% coverage to 52% coverage may be improbable for most infrastructures; however, its implications cannot be ignored if you switch from standard reservations to the modifiable regional reservations.

 

Notably, if you look at the data from an instance type perspective, the RI Coverage will remain the same regardless of the normalization factor. Consider the previous example by looking only at RI Coverage at the instance type level.

 

Hour-based metric:

Two mediums (covered) = 2 / 2 = 100% RI Coverage

Two 8xlarges with one covered and one on-demand = 1 / 2 = 50% RI Coverage

 

Apply Normalization:

Two mediums with a normalization factor of 2 = 2 * 2 = 4 units

Two 8xlarges with a normalization factor of 64 = 2 * 64 = 128 units

 

Unit-based metric:

Two mediums = 4 units = 4 / 4 = 100% RI Coverage

Two 8xlarges = 128 units = 64 / 128 = 50% RI Coverage

 

In both scenarios, the respective RI Coverage matches whether we use an hour-based or unit-based approach; however, the aggregate RI Coverage metric will be lower in this example after normalization is applied due to the increased weight of the 8xlarge in the calculation.

 

Conversely, if the RI Coverage for 8xlarges was higher than the mediums using a unit based approach, then the aggregate RI Coverage would be higher after normalization was applied.

 

My preceding explanation aids in understanding the primary components of RI Normalization’s affect on RI Coverage. If you simply want to know if your coverage metric will increase or decrease following normalization, all you need to do is find the size of your average reserved instance and if coverage for larger-than -average RIs is greater than coverage for RIs smaller-than-average then the resultant coverage will be greater.

RI Coverage metric will increase following normalization.

If the coverage for larger-than-average RIs is lesser than coverage for smaller-than-average RIs then the resultant coverage metric will be lesser.

 

RI Coverage metric will decrease following normalization

 

I have leveraged visualization, formulaic expressions and plain text to shed light on the implications of normalization’s effect on the weight of instance types to the RI Coverage metric relative to their size and corresponding normalization factor. The required data can be found in your Cost and Usage Reporting (CUR). Depending on the size of your AWS footprint, the CUR could be as small as dozens of records or as large as millions (or billions) every month. Our team has put considerable effort into creating a data pipeline via S3>Redshift>QlikView to enable analytics solutions. Regardless of the architecture, the insights gained require time and a commitment to excellence.

 

The coverage metric does not change but rather our perspective of the metric changes. The transformation from hourly-based to unit-based numbers is essential to accurately track the behavior of instance size changes to your current RI infrastructure. This is important because whether your infrastructure is ten instances or 10,000, the ability to explain the difference between the resultant metrics will be an important part of your cloud journey.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s