The Hypothermic Cloud Infrastructure: Maintaining the Blood Flow to Tier 1 & 2 Apps
A particular sore spot since the beginning of the rise of cloud infrastructures, even the advent of utility computing is this: how do you forecast and pay for resources and justify costs for problems you don’t have yet? After all, the entire premise behind funding the acquisition of compute resources is that you are solving an already identified need (problem) which you then attach a cost and an ROI to in order to convince management that they should cut a check. That model doesn’t quite work in a cloud environment because usually the funding request is future looking (to solve tomorrow’s need) therefore ROI is difficult if not impossible to predict. Here at GreenPages, the more we talk to customers about cloud and cloud technologies, and as the technologies themselves evolve into ever greater levels of sophistication and capabilities, we’re finding some very interesting and innovative ways to help solve exactly that.
One example/issue that I’ve been thinking about lately in particular is what I’m calling the “Hypothermic Cloud Infrastructure.” Hypothermia, as you may know, is defined as a condition in which a human’s core body temperature drops below the required temperature for normal metabolism and body functions which is defined as 35.0 °C (95.0 °F). When a human body is exposed to frigid temperatures, say from falling into freezing water, the body’s temperature starts to drop in direct proportion to the amount of time of exposure. In an attempt to preserve life below a certain internal temperature, the autonomic functions of the brain start to restrict the flow of blood to the extremities in order to maintain its core temperature. As the core body temp gets colder and colder, less and less blood is allowed to flow so, in short, the body is prepared and quite willing to lose arms and legs to frostbite in order to save the life of the person…in order to keep the core temperature maintained above a certain threshold.
In the Hypothermic Cloud Infrastructure, we are equating that datacenter infrastructure to a human body in the sense that there are “core” applications and there are “extremity” applications that together make up the whole human, er, um, infrastructure.We define core as those applications (i.e. Tier 1 or 2) that are extremely business critical and have the highest level of attention, and if they were to become unavailable, truly awful things would happen, the least of which is that a great amount of money might be lost…the most being several jobs are toast. Conversely, on the other end of the spectrum, extremity applications might be called Tier 5 or 6 or even Tier 10 and, while they are important to the business (otherwise why have them at all), they are used infrequently by only a few people or for some reason are deprecated within the tiered infrastructure. These applications can disappear for months at a time and effectively no one notices…until those applications are needed of course.
To create the Hypothermic Cloud Infrastructure, there has to be a way to organize, manage and control the “blood flow” (the amount of underlying resources that are supplied-compute, memory, storage and networking-to all of the applications within the infrastructure) and to maintain the association of their status; i.e. are they a “core” or are they an “extremity” application. The facility that would do this organization, management and control is a highly performant policy engine (the autonomic brain function) that is integrated directly to the automation and orchestration engine within the Hypothermic Cloud Infrastructure. What this policy engine does, one of the many things it does, is deeply understand all of the relevant application’s priorities, and it makes policy decisions in order to produce the best possible outcome. That is to say that the most important applications never suffer a degradation or an outage while the least are minimally taken care of. So, what’s a policy decision? Well, in order to answer that let me give an example of the Hypothermic Cloud Infrastructure in action, and I think that that will adequately describe, at least at a high level, what policy decisions are and why they are so important.
In the Hypothermic Cloud Infrastructure there are several thousands of applications running and they all have different levels of priority; sometimes that priority changes very little and other times it changes several time per week or day. The priority might change in reaction to business events or seasonal adjustments or because you made that explicit decision…for whatever reason…but the point is that they do change…today it might be a core application and tomorrow it’s an extremity application.
One fine summer day, one of the Tier 1 applications begins to consume more and more resources (above and beyond what has been deemed normal) and while the IT staff starts their investigation into “why,” the Hypothermic Cloud Infrastructure knows that this application, based on its SLA, cannot fall below a certain performance level, no matter what, so the policy engine kicks off the process to seek out all of the lowest tier applications, selects a few that match some relevant criteria (i.e. the right sort of required resources), archives those applications and shifts the formerly in-use resources over to support the Tier 1 application in order to continue to maintain application performance. There is no scrambling like a madman trying to find additional resources…it just happens.
The New Normal
While the engineering staff continues the investigation into why the application suddenly went batshi…er, crazy, the policy engine examines the characteristics of the running application and might determine that this is the “new normal” and permanently assign those resources to the Tier 1 application. This configuration is then automatically recognized, recorded and committed to the CMDB and those resources will never again be available for the lower tier applications. Because some other fine day in the future, the lower tier applications will eventually need to be brought out of archive status, additional processes (automated and manual) are kicked off for procuring new resources (i.e. hardware) required to support the lower tier applications. The difference now is that it is a well regulated and orderly process…with plenty of time to order standard equipment…and there are no fire drills based on an artificial emergency that would drive exorbitant prices! However, if the policy engine determines that this is not the new normal, it will wait until the application settles down and the need for additional resources is no longer imperative, and will then release them back into the resources pools, pull the lower tier applications out of the archive, re-assign resources and spin the applications up as if nothing happened. No additional resources are required, and the Tier 1 Application maintained its SLA throughout the process.
So, as you can see, the Hypothermic Cloud Infrastructure does solve the problem of reacting to abnormal system conditions and effectively predicting resource requirements but no one is saying that such a system with such a policy engine is out there sitting on your favorite ISV’s shelf, waiting to be downloaded. What we are saying is that the technology that enables all of this capability does exist but, at this point, it may exist in a few different places, available from a few different vendors, called a whole bunch of different names…but it does exist.
I’m kinda hoping we’re the first to find it…but…no one’s stopping you
A guest post by Trevor Williamson, from Journey to the Cloud.
Photo source: https://www.sxc.hu/photo/1020727.