When setting up a cloud infrastructure, it is essential to plan for a desired capacity - how many servers, how much storage and what is the network bandwidth. As cloud services are to be designed for elastic demands - capacity planning becomes quite tricky.
Fundamentally capacity planning is driven by three major parameters:
1. Economics of Cloud Computing: Clouds must be cheaper than hosted environments
2. Performance Requirements: Hosted applications must meet certain performance targets
3. Ability to meet elastic demands: Ability to take on extra work loads without compromising on service quality.
The cloud capacity decision is also influenced by:
1. Security & Reliability Requirements: The DR (Disaster Recovery) strategy has a huge impact on the required capacity. In addition data security also impacts the capacity requirements.
2. Software Requirements: Depending on the type of software that is deployed on the cloud, the underlying capacity of the infrastructure needs to be determined.
3. Time to deploy additional resources: It takes time to bring in additional servers into the cloud. During that time, the existing cloud capacity must be able to handle additional loads. This is also called as buffer capacity.
Capacity Planning: Initial steps
The goal of capacity planning is to ensure that you always have sufficient but not excessive resources to meet customers' needs in a timely fashion.
This requires three main details:
1. Get a measure of what is going on in the cloud. This entails getting information on customers' current workloads.
2. Forecast the future workloads. How many new clients will be added, and their projected work loads.
3. Develop a model that will meet the customer requirements - considering the response times or Salsa; for the lowest possible cost.
The actual capacity planning is an iterative process. In each iteration, you need to validate your assumptions with the actual data and then constantly revalidate your model.
For example, if the current load per customer is (say) 5000 transactions per minute, and the customer's estimate for next year is 10000 transactions per minute, then you need to constantly validate this projections for the next round capacity planning.
As this is a model - based on human intelligence and guess work, one needs to be cautious in making certain assumptions. It is not always possible to get a full measure of all the current workloads of all customers. Therefore a representative sample size of customers must be taken for all calculations.
The model has to be verified over a range of real-life conditions, and become useful for prediction.
Queue Time & Utilization
All capacity planning is based on queuing theory, which was originally developed by Agner Krarup Erland in 1909. This theory explains the powerful relationship between the resource utilization and response times. The theory was originally used to plan the capacity of telephone switches and continues to be used for designing routers in the packet switched networks. The same theory is applicable to the cloud capacity planning as well.
Queues form because resources are limited and demand fluctuates. If there was only one computer and only one workload, then the computer is servicing that workload. Suppose the work load is finished before the next workload arrives, then the computer becomes idle - thus unutilized. On the other hand, if a new work load arrives before the current workload is finished, then the new workload will have to wait in a queue for its turn. In this process the computer is now being utilized, so as the number of workloads exceed the computer gets busier and busier - and the utilization of the resources increases, and the wait time increases exponentially. The classic queue length (response time) versus utilization curve looks like the curve shown in Figure-1.
Now consider this:
Response Time = Waiting Time (Queuing Time) + Processing Time
and
Waiting Time = Time to Service One Request × Queue Length
Now, consider a case where a SQL query takes 100 millisecond to complete. If the queue is empty, then the query will be serviced in 100 milliseconds. But if there are 40 queries ahead in the queue, then it will take 100x40 +100= 4100 milli seconds. The wait time increases exponentially. Suppose there were two servers, then the time drops to 2100 milliseconds, if there were 4 servers then the time drops to 1100 milliseconds. So as you can see increasing the number of resources causes a dramatic reduction in wait times.
In reality, all computers are not serving all the time. There will be times when computers are idle waiting in workloads. Say a computer is busy 98% of the time, then adding another server will make both the servers busy only 49% of the time - i.e., 51% of the time the servers are idle, consuming power and the resources are wasted. Adding resources will definitely decrease the wait times, but it also results in wastage. From the utilization curve, we can see that the kneel point happens at 65% utilization - i.e., increasing utilization above 65% increases wait times exponentially.
In a cloud system there are several resources to be considered - CPU, disks, storage, network latency, bandwidth etc., and there is an associated wait times for each of the resource. Proper planning of a cloud infrastructure will need allocating proper resources for a overall customer experience. Also in a multi-tenancy system, the customer demands are not in sync - therefore service providers can move certain resources from one customer to another customer depending on customer work loads.
Closing Thoughts
Cloud computing is a capacity planner' dream. The beauty of cloud based resources are that you can dynamically and elastically expand and contract the resources reserved for your use, and you only pay for what you have reserved.
Also see: