by Ming Sheu
by Ming Sheu
The accelerated shift from running applications on-premises to running them in the cloud has taught many enterprises expensive lessons about managing pay-as-you-go pricing models. Many large organizations continue to provision cloud resources ineffectively, costing them well into the hundreds of thousands—sometimes millions—of dollars annually.
As cloud adoption accelerates, the problem is getting worse. According to Gartner, public cloud workloads are forecast to rise 18% this year, while Flexera’s 2020 State of the Cloud Report estimates businesses waste 35% of cloud spend. Datadog, which monitors cloud app performance for thousands of enterprises, says that nearly half of the apps they watch use 30% or fewer allocated resources.
Manual, “gut check” approaches to allocating cloud resources are unsustainable. To avoid the performance impacts of under-provisioning, risk-averse CloudOps teams commonly over-provision resources. And over-provision results in over-spending.
There are two main reasons why highly skilled CloudOps teams struggle to get a tighter grip on provisioning. The first is that these teams lack visibility into the hosted services their apps run on. The second is that they lack the capabilities to predict what resources are needed. These teams also lack the tools to choose the most cost-optimized cluster configurations for their workloads.
And the provisioning challenge is only going to become more complex. CloudOps teams are finding it increasingly difficult to keep up with the exploding volume of decisions and adjustments required to keep application utilization consistently at the right level. Manually managing this volume has become impractical. Compounding this complexity is the dynamic nature of today’s microservices-based architectures that rely on Kubernetes to help scale resources up or down to meet demand. This complexity is a critical inhibitor to organizations looking to accelerate their journey to the cloud. Organizations face the genuine risk of not benefiting from public, hybrid, and multi-cloud platforms if they fail to find a way to manage complexity and optimize their cloud spending.
CloudOps teams need a new way to manage cloud resources that are automated and intelligent and that looks at the full application stack—from the workload down to the container, the virtualized infrastructure layer, and individual hardware components—and across cloud instances. A modern approach to hybrid and multi-cloud provisioning must be aware of the many interrelationships between the virtual and physical features in a system and continuously observe and react to the dynamic changes occurring throughout the system in real-time.
Traditional cloud provisioning looks at CPU and memory utilization. That’s a fairly rudimentary approach that fails to account for a significant number of variables and complexity in infrastructure management. Traditional algorithms, such as the M.A. (moving average) and the ARIMA (Autoregressive Integrated Moving Average) models, cannot accurately model dynamic performance metrics.
In addition to having a limited view of provisioning requirements, these traditional approaches cannot scale and frequently produce numerous false alarms triggered by static utilization thresholds and mistaken metrics. Having to deal with false alarms is a significant issue that can undermine the efficiency of any CloudOps team. So can lousy timing. Provisioning problems generally arise before traditional tools can alert teams to react. And when you consider that the number of applications deployed in the cloud can quickly grow from dozens to hundreds—or even to thousands—in a matter of months, you can begin to appreciate the scope of the problem.
Many companies will quickly outgrow the autoscaling function in Kubernetes. HPA (horizontal pod autoscaling) takes a reactive approach based on current workload and resource utilization levels. That autoscaling outgrowth further compounds the provisioning challenges. HPA doesn’t anticipate shifts in workload characteristics or maintain a continuous understanding of system state in a way that allows for the kind of predictive adaptation that today’s cloud environments require.
At ProphetStor, we believe that application rightsizing must be done at a granular level. When you apply the suitable machine learning approaches to I.T. operations, your CloudOps team can gain a much more fine-grained understanding of the complex. It lets you see things like CPU and memory usage, network traffic, and power consumption metrics—all of the interrelated dependencies that I.T. administrators need to understand to better plan and allocate cloud and data center resources.
ProphetStor gives administrators control over their private, public, hybrid, and multi-cloud environments by providing real-time observability. ProphetStor Federator.ai ingests telemetry from multiple sources, including trusted application performance monitoring products such as Sysdig and Datadog and standard open source tools like Prometheus. Federator.ai reasons across this rich set of telemetry data to deliver powerful real-time provisioning intelligence that uses a multi-layer correlation to predict resource consumption dynamically. This results in highly accurate recommendations for the resources needed by pods—offers that are grounded in reality and not best guesses. Federator.ai can reduce over-provisioning of resources by 20–70% for a typical workload while simultaneously preventing under-provisioning resources for mission-critical workloads.
Workload Prediction – Capturing the Application Dynamics
ProphetStor Federator.ai software provides application metrics that enable operations teams to make dynamic workload resource spending assessments. A workload prediction chart compares infrastructure cost vs. user expenditure over an indefinite period, along with captured application dynamics such as comparisons of projected vs. actual user demand.
ProphetStor believes the best indicator of future performance comes from understanding past behavior combined with a detailed awareness of present conditions. Federator.ai analyzes application behavior and then performs intelligent autoscaling and anticipation modeling for the whole application. The goal of Federator.ai is to generate proactive recommendations that anticipate resource needs in advance, which is a far superior approach to reacting after performance hits a wall. Customers benefit from continuous optimization of application performance and eliminating the bill shock that occurs when they are forced to pay for wasted, over-provisioned resources.
When the CloudOps team at Orange S.A., the fourth largest telecom in Europe and the tenth-largest in the world, set out to build a cloud-native platform based on Kubernetes, faced the challenge of supporting applications across multiple business units and geographies. As a significant telecom provider serving 260 million customers worldwide, Orange supports applications with vastly different sets of workloads and service-level requirements that share a shared pool of cloud resources.
Orange’s CloudOps team lacked the tools needed to effectively manage the resource requirements of the company’s growing number of cloud applications. In attempting to establish high service levels, the Orange team did what most CloudOps teams do—they erred on the side of application performance and resorted to over-provisioning. This led to a surplus of unused cloud resources, which added unnecessary pay-as-you-go costs that promised to grow over time.
Orange turned to ProphetStor to address this problem. With Federator.ai, the Orange team established automation and performance analytics on each application workload and then managed resource usage across all workloads using a single user-friendly dashboard. Federator.ai determined the right level of resources per workload and employed autoscaling to rightsize resources for every application running on Orange’s Kubernetes clusters. As a result, Orange was to achieve optimal performance and improve capacity planning and resource optimization for all applications at scale.
Thanks to ProphetStor Federator.ai, Orange realized performance gains of up to 80% and save more than 35% in utilization costs across its more than 100 cloud applications.
Once a CloudOps team masters the pay-as-you-go cloud model with the help of Federator.ai’s predictive A.I., it can build on that savings momentum by employing Federator.ai’s cost analysis capabilities to anticipate and improve planning for future deployments.
For instance, before migrating workloads to a public cloud, a CloudOps team can use Federator.ai’s machine learning to characterize workload resource requirements and compare costs across multiple service providers and even across a single service provider’s hosting locations. The act of comparing prices of cloud instances and moving applications to the lowest-price instance is a process referred to as cloud arbitrage. CloudOps teams can find the best cloud instances from a cost-performance standpoint with cloud arbitrage, helping to optimize performance further and reduce cost.
The ProphetStor Federator.ai software dashboard displays AI-based recommended cluster configurations that help CloudOps teams determine both costs and configurations for workloads on a cloud service provider.
With Federator.ai, CloudOps teams will see enhanced productivity. Teams will be freed from manually specifying CPU and memory requirements for every container, continuously monitoring OpenShift cluster utilization, and manually recording usage data. Through A.I. and automation, Federator.ai enables CloudOps teams to quickly achieve optimal pod configurations that accurately match application resource requirements and dynamically adapt those configurations to the continual changes in the runtime environment. In so doing, Federator.ai helps to control the costs associated with over-provisioning while simultaneously reducing the performance impacts of under-provisioning, such as out-of-memory (OOM) conditions.
Functional Architecture of Federator.ai.
ProphetStor is excited to offer Federator.ai as a certified Red Hat OpenShift operator in the Red Hat Marketplace. The convenience of simplified governance through Red Hat Marketplace provides complimentary benefits to CloudOps teams who can perform intelligent autoscaling and better anticipate the cost of private, public, hybrid, and multi-cloud deployments. Customers who use ProphetStor’s patented prediction technologies can expect to save from 35% to 70% in cloud utilization costs while improving workload performance and reducing manual configuration time.
Using gut instinct to manage cloud environments is becoming a thing of the past in today’s increasingly complex cloud environments. By equipping your CloudOps team with machine learning-based insights from Federator.ai, you can eliminate cloud waste and optimize the cost and performance of your cloud workloads.