Tuesday, January 31, 2012

Infrastructure Management of Private Cloud


Managing a private cloud Infrastructure is almost same as managing an on-premises data center - but with minor changes. Mostly in workload management of servers and network performance.

Initially IT administrators assumed that managing private cloud was same as managing a data center, but there is a key difference - private cloud has an SLA or QoS (Quality of Service) requirements. In the older data center environment was mostly based on best effort and rapid time-to-restore efforts, cloud services often have 99.999% uptime requirement and other quality of service metrics.

Server Utilization Management

Server utilization management has three main components:
1. Capacity planning
2. Server monitoring
3. Load balancing

Capacity Planning

The main problem in managing private cloud is server utilization management. Today, with server virtualization, servers are already working at 90%+ capacity, there is little buffers available for workload fluctuations. This in turn translates into better capacity planning during the planning phase. (see: Capacity planning for cloud)

Cloud computing allows users to log in from anywhere in the world, from any device. This implies that the peak work loads can vary and the cloud infrastructure capacity must be planned to handle this fluctuating work loads. This problem is particularly big for international companies - where users log in at different time zones.

Bottlenecks in network infrastructure can cause major performance degradation in the cloud services. It is important to know the network loads for capacity planning and allocated additional bandwidth - by using link aggregation techniques during capacity planning. Continuos monitoring of network traffic will help in proper capacity planning and allocation.

Server Monitoring

With virtualization, servers are constantly running at very high utilization. This creates hardware failures - causing server outages. (see Challenges in Operations Management of Virtual Infrastructure)

As the server clusters are already running at very high utilization, a hardware failure causing a server outage can wreak havoc in work load management. To prevent such outages and to improve the repair time in case of hardware failures, IT administrators will have to continuous monitoring of server clusters and VM's. IT administrators will have to monitor the servers for: CPU utilization, memory utilization, power supply failures, Fan failure, temperature sensor, voltage sensors, disk failures etc. In addition, VM's will also have to be monitored for CPU/Memory utilization. The monitoring tools can send alerts to IT administrators when any of these parameters cross a set threshold - thus allowing time for corrective steps.

Having visibility on how server resources are being used helps in resource allocations and capacity planning to meet the business work loads.

Load Balancing

Load-balancing in the cloud is a lot different than in a traditional data center. Unlike in a traditional data center where load balancing was essentially a server function, load-balancing in a private cloud involves networks and storage. Thus load balancing involves multiple devices within the cloud environment.

Cloud infrastructures are often mirrored for high availability and this allows work loads to be transferred from one data center to another. This high level load balancing will require a constant monitoring of the entire cloud infrastructure and that calls for deploying end-to-end monitoring tools to monitor the network, servers and storage devices for constant visibility on the devices to identify potential issues and resolve it quickly.

IT Admins must evaluate which workloads that will be deployed on the cloud and understand its effects on the cloud-based server. For example, if VDI is being deployed, knowing the image size and work load per VDI will determining how many VDI's can be deployed on one physical server. By sizing the servers based on this information, administrators can set a cap on user count and disable additional logons once the threshold is met. Any new users will log in to a different server that has been made available for load balancing purposes.

Load balancing determines size and properly configures hardware at the server level. If a server becomes overloaded, a resource lock will occur, which can degrade performance and affect the end-user experience.

Load balancing will also have to be applied on the WAN gateways, If the WAN gateway gets overloaded, additional bandwidth will have to be activated. If a WAN gateway breaks down, the load balancer must detect the connection loss and immediately load balances to the next available appliance, allowing continuous access into an environment -- even if a device has failed.

Closing Thoughts

Infrastructure Management for cloud deployments is tricky. The older laws of managing a data center is not directly applicable in the cloud. Particular attention must be paid for capacity planning, server monitoring and load balancing - for successful cloud deployments.

Also See:


No comments: