Tuesday, June 27, 2017

Key Metrics to measure Cloud Services


As a business user, if you are planning to host your IT workloads on a public cloud and you want to know how to measure the performance of the cloud service, here are seven important metrics you should consider.

1. System Availability

A cloud service must be available 24x7x365. However, there could be downtimes due to various reasons. This system availability is defined as the percentage of time that a service or system is available. Often expressed as a percentage. For example, a downtime of 7.5 hours unavailable per year or 99.9% availability! A downtime of few hours can potentially cause millions of dollars in losses.

365 or 3.65 days of downtime per year, which is typical for non redundant hardware if you include the time to reload the operating system and restore backups (if you have them) after a failure. Three nines is about 8 hours of downtime, four nines is about 52 minutes and the holy grail of 5 nines is 7 minutes.

2. Reliability or also known as Mean Time Between Failure (MTBF)  and Mean Time To Repair(MTTR)

Reliability is a function of two components: Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) - i.e., time taken to fix the problem. In the world of cloud services, it is important to know MTTR is often defined as the average time required to bring back a failed service back into production status.

Hardware failure of IT equipment can lead to a degradation in performance for end users and can result in losses to the business. For example, a failure of a hard drive in a storage system can slow down the read speed - which in turn causes delays in customer response times.

Today, most cloud systems are built with high levels of hardware redundancies - but this increases the cost of cloud service.        

3. Response Time

Response time is defined as the time it takes for any workload to place a request for
work on the cloud system and for the cloud system to complete the request. Response time is heavily dependent on the network latencies.

Today, If the user and the data center are located in the same region, the average overall response time is 50.35 milliseconds. When the user base and data centers are located in different regions, the response time increases significantly, to an average of 401.72 milliseconds.

Response Time gives a clear picture of the overall performance of the cloud. It is therefore very important to know the response times to understand the impact on application performance and availability - which in-turn impacts customer experience.

4. Throughput or Bandwidth

The performance of cloud services are also measured with throughput; i.e., Number of tasks completed by the cloud service over a specific period. For transaction processing systems, it is normally measured as transactions/second. For systems processing bulk data, such as audio or video servers, it is measured as a data rate (e.g., Megabytes per second).

Web server throughput is often expressed as the number of supported users – though clearly this depends on the level of user activity, which is difficult to measure consistently. Alternatively, cloud service providers publish their throughputs in terms of bandwidth - i.e., 300MB/Sec, 1GB/sec etc. This bandwidth numbers most often exceeds the rate of data transfer required by the software application.

In case of mobile apps or IoT, there can be a very large number of apps or devices streaming data to or from the cloud system. Therefore it is important to ensure that there is sufficient bandwidth to support the current user base.

5. Security

For cloud services, security is often defined as the set of control based technologies and policies designed to adhere to regulatory compliance rules and protect information, data applications and infrastructure associated with cloud computing use. The processes will also likely include a business continuity and data backup plan in the case of a cloud security breach.

Often times, cloud security is categorized into multiple areas: Security Standards, Access Control, Data Protection (Data unavailability & Data loss prevention), Network  - Denial of service (DoS or DDoS)

6. Capacity

Capacity is the size of the workload compared to available infrastructure for that workload in the cloud. For example, capacity requirements can be calculated by tracking average utilization over time of workloads with varying demand, and working from the mean to find the capacity to handle 95% of all workloads.  If the workloads increases beyond a point, then one needs to add more capacity - which increases costs.

7. Scalability

Scalability refers to the ability to service a theoretical number of users - degree to which the service or system can support a defined growth scenario.

In cloud systems, scalability is often mentioned as scalable up to tens of thousands, hundreds of thousands, millions, or even more, simultaneous users. That means that at full capacity (usually marked as 80%), the system can handle that many users without failure to any user or without crashing as a whole because of resource exhaustion. The better an application's scalability, the more users the cloud system can handle simultaneously.

Closing Thoughts


Cloud service providers often publish their performance metrics - but one needs to dive in deeper and understand how these metrics can impact the applications being run on that cloud. 

Wednesday, June 14, 2017

How to Design a Successful Data Lake

Today, business leaders are continuously envisioning new and innovative ways to use data for operational reporting and advanced data analytics. Data Lake is a next-generation data storage and management solution, was developed to meet the ever increasing demands of business & data analytics.

In this article I will explore some of the existing challenges with the traditional enterprise data warehouse and other existing data management and analytic solutions. I will describe the necessary features of the Data Lake architecture and the capabilities required to leverage a Data and Analytics as a Service (DAaaS) model, characteristics of a successful Data Lake implementation and critical considerations for designing a Data Lake.

Current challenges with Enterprise Data Warehouse 

Business leaders are continuously demanding new and innovative ways to use data analysis to gain competitive advantages.

With the development of new data storage and data analytic tools, the traditional enterprise data warehouse solutions have become inadequate and are impeding maximum usage of data analytics and even prevent users from maximizing their analytic capabilities.

Traditional data warehouse tools has the following shortcomings:

Timeliness 
Introducing new data types and content to an existing data warehouse is usually a time consuming and cumbersome process.

When users want quick access to data,  processing delays can be frustrating and cause users to ignore/stop using data warehouse tools, and instead develop an alternate ad-hoc systems  which costs more, waste valuable resources and bypasses proper security systems.

Quality
If users do not know the origin or source of data  - currently stored in the data warehouse, users view such data with suspicion and may not trust the data. Current data warehousing solutions often store processed data - in which source information is often lost.

Historical data often have some parts missing or inaccurate, the source of data is usually not captured. All this leads to situations where analysis results provide wrong or conflicting results.

Flexibility 
Today's on-demand world needs data to be accessed on-demand and results available in near real time. If users are not able to access this data in time, they lose the ability to analyze the data and derive critical insights when needed.

Traditional data warehouses "pull" data from different sources - based on a pre-defined business needs. This implies that users will have to wait till the data is brought into the data warehouse. This seriously impacts the on-demand capability of business data analysis.

Searchablity
In the world of Google, users demand a rapid and easy search for all their enterprise data. Many of the traditional data warehousing solutions - do not support an easy search tools. Thus customers cannot find the required data and it limits users' ability to make best use of data warehouses for rapid on-demand data analysis.

Today's Need


Modern data analytics - be it Big Data or BI or BW require:


  1. Support multiple types (structured/unstructured) of data to be stored in its raw form - along with source details.
     
  2. Allow rapid ingestion of data - to support real time or near real time analysis
     
  3. Handle & manage very large data sets - both in terms of data streams and data sizes.
     
  4. Allow multiple users to search, access and use this data simultaneously from a well known secure place.
     


Looking at all the demands of modern business, the solution that fits all of the above criteria is the Data lake.

What is a Data Lake? 


A Data Lake is a data storing solution featuring a scalable data stores - to store vast amounts of data in various formats. Data from multiple sources: Databases, Web server logs, Point-of-sale devices, IoT sensors, ERP/business systems, Social media, third party information sources etc are all collected, curated into this data lake via an ingestion process. Data can flow into the Data Lake by either batch processing or real-time processing of streaming data.

Data lake holds both raw & processed data along with all the metadata and lineage of the data which is available in a common searchable data catalog. Data is no longer restrained by initial schema decisions, and can be used more freely across the enterprise.

Data Lake is an architected data solution - on which all the common compliance & security policies also applied.

Businesses can now use this data on demand to provide Data and Analytics as a Service  (DAaaS) model to various consumers. ( Business users, data scientists, business analysts)

Note: Data Lakes are often built around a strong scalable, globally distributed storage systems. Please refer my other articles regarding storage for Data lake

Data Lake: Storage for Hadoop & Big Data Analytics

Understanding Data in Big Data

Uses of Data Lake

Data Lake is the place were raw data is ingested, curated and used for modification via ETL tools. Existing data warehouse tools can use this data for analysis along with newer big data, AI tools.

Once a data lake is created, users can use a wide range of analytics tools of their choice to develop reports, develop insights and act on it. The data lake holds both raw data & transformed data along with all the metadata associated with the data.

DAaaS model enables users to self-serve their data and analytic needs. Users browse the data lake's catalog to find and select the available data and fill a metaphorical "shopping cart" with data to work with.

Broadly speaking, there are six main uses of data lake:


  1. Discover: Automatically and incrementally "fingerprint" data at scale by analyzing source data.
     
  2. Organize: Use machine learning to automatically tag and match data fingerprints to glossary terms. Match the unmatched terms through crowd sourcing
     
  3. Curate: Human review accepts or rejects tags and automates data access control via tag based security
  4. Search: Search for data through the Waterline GUI or through integration via 3rd party applications
     
  5. Rate: Use objective profiling information along with subjective crowdsourced input to rate data quality
     
  6. Collaborate: Crowdsource annotations and ratings to collaborate and share "tribal knowledge" about your data

Characteristics of a Successful Data Lake Implementation


Data Lake enables users to analyze the full variety and volume of data stored in the lake. This necessitates features and functionalities to secure and curate the data, and then to run analytics, visualization, and reporting on it. The characteristics of a successful Data Lake include:


  1. Use of multiple tools and products. Extracting maximum value out of the Data Lake requires customized management and integration that are currently unavailable from any single open-source platform or commercial product vendor. The cross-engine integration necessary for a successful Data Lake requires multiple technology stacks that natively support structured, semi-structured, and unstructured data types.
     
  2. Domain specification. The Data Lake must be tailored to the specific industry. A Data Lake customized for biomedical research would be significantly different from one tailored to financial services. The Data Lake requires a business-aware data-locating capability that enables business users to find, explore, understand, and trust the data. This search capability needs to provide an intuitive means for navigation, including key word, faceted, and graphical search. Under the covers, such a capability requires sophisticated business processes, within which business terminology can be mapped to the physical data. The tools used should enable independence from IT so that business users can obtain the data they need when they need it and can analyze it as necessary, without IT intervention.
     
  3. Automated metadata management. The Data Lake concept relies on capturing a robust set of attributes for every piece of content within the lake. Attributes like data lineage, data quality, and usage history are vital to usability. Maintaining this metadata requires a highly-automated metadata extraction, capture, and tracking facility. Without a high-degree of automated and mandatory metadata management, a Data Lake will rapidly become a Data Swamp.
     
  4. Configurable ingestion workflows. In a thriving Data Lake, new sources of external information will be continually discovered by business users. These new sources need to be rapidly on-boarded to avoid frustration and to realize immediate opportunities. A configuration-driven, ingestion workflow mechanism can provide a high level of reuse, enabling easy, secure, and trackable content ingestion from new sources.
     
  5. Integrate with the existing environment. The Data Lake needs to meld into and support the existing enterprise data management paradigms, tools, and methods. It needs a supervisor that integrates and manages, when required, existing data management tools, such as data profiling, data mastering and cleansing, and data masking technologies.


Keeping all of these elements in mind is critical for the design of a successful Data Lake.


Designing the Data Lake


Designing a successful Data Lake is an intensive endeavor, requiring a comprehensive understanding of the technical requirements and the business acumen to fully customize and integrate the architecture for the organization's specific needs. Data Scientists and Engineers provide the expertise necessary to evolve the Data Lake to a successful Data and Analytics as a Service solution, including:

DAaaS Strategy Service Definition. Data users can leverage define the catalog of services to be provided by the DAaaS platform, including data onboarding, data cleansing, data transformation, data catalogs, analytic tool libraries, and others.

DAaaS Architecture. Datalake help data users create a right DAaaS architecture, including architecting the environment, selecting components, defining engineering processes, and designing user interfaces.

DAaaS PoC. Rapidly design and execute Proofs-of-Concept (PoC) to demonstrate the viability of the DAaaS approach. Key capabilities of the DAaaS platform are built/demonstrated using leading-edge bases and other selected tools.

DAaaS Operating Model Design and Rollout. Customize DAaaS operating models to meet the individual business users' processes, organizational structure, rules, and governance. This includes establishing DAaaS chargeback models, consumption tracking, and reporting mechanisms.

DAaaS Platform Capability Build-Out. Provide an iterative build-out of all data analytics platform capabilities, including design, development and integration, testing, data loading, metadata and catalog population, and rollout.

Closing Thoughts  


Data Lake can be an effective data management solution for advanced analytics experts and business users alike. A Data Lake allows users to analyze a large variety and volume when and how they want. DAaaS model provides users with on-demand, self-serve data for all their analysis needs

However, to be successful, a Data Lake needs to leverage a multitude of products while being tailored to the industry and providing users with extensive, scalable customization- In short, it takes a blend of technical expertise and business acumen to help organizations design and implement their perfect Data Lake. 

Tuesday, June 13, 2017

Key Product Management Principle - People are the core Asset


2017 is turning out to be a tumuntous year for IT industry worldwide. Large established IT companies such as Cisco, HPE, Dell-EMC, IBM are serious cutting down costs.  Unfortunately, companies tend to look at people as "expenses" and layoffs have become common.

Product managers have From a product managers often answers three main questions:

1. Where is the Product Today?
2. Where do we want to take the product & by what time?
3. How can the team get the product there?

Therefore, product managers have a different view when it comes to employees. From a product development perspective, people are "assets" - especially engineering teams and customer facing teams. Success of new product development depends of people.

Product managers treat people as true assets as the success of the new products - which creates future revenue for the company. Without people, the new product will never be able to reach its intended goal.

In IT, engineers and their intellect, skills, knowledge, character, integrity are - the true value in any organization. Because of the nature of IT product development, it is vital that product managers treat their engineering colleagues as true assets.  Product manager must spend time with the team. This means talking with them, listening to their concerns and fears about the current phase of the project, and occasionally taking them out for lunch. ( Lunch is a truly amazing way to motivate people)

Product managers have to make the team members feel valued. That's when engineers they care more about the product on which they are working. Face-time with the team also helps product managers understand individuals and personally assist them. Time spent with the team pays financial dividends as high-quality products make it to market on time and with enough vitality to excite the sales force.

Closing Thoughts

When product managers focus on the people with whom they work, the products succeed as a result.

Monday, June 12, 2017

Taking Analytics to the edge


In my previous article, I had written about HPE's EdgeLine servers for IoT analytics.

In 2017, we are seeing a steady wave of growth in data analytics that's happening on the edge and HPE is in the forefront of this wave - leveraging its strengths in hardware, software, services, and partnership to build powerful analytic capabilities.

With HPE EdgeLine, customers are able to move  analytics from the data center to the to the edge, providing rapid insights from remote sensors to solve critical challenges in multiple industries like energy, manufacturing, telecom,  and financial services.

Why IoT project fail?


Recently, Cisco reported that  ~75% of IoT projects fail. This is because IoT data has been managed in centralized, cloud-based systems. In traditional settings, data is moved from a connected 'thing' to a central system over a combination of cell-phone, Wi-Fi and enterprise IT network, to be managed, secured, and analyzed.

But with IoT devices generating huge volumes of data, and data being generated at multiple sites - even in remote areas with intermittent connectivity. This meant that analysis could not be done in a meaningful way as the data collection was taking time, and when the analysis was completed, results were computed, it was irrelevant.

Centralized cloud systems for IoT data analysis just does not scale nor can it perform at speeds needed.

HPE Solution - EdgeLine servers for Analytics on the Edge


With HPE EdgeLine servers, we now have a  solution that optimizes data for immediate analysis and decision making at the edge of the network and beyond.

For the first time ever, customers have the first holistic experience of the connected condition of things (machines, networks, apps, devices, people, etc.) through the combined power of HPE EdgeLine servers and Aruba wireless networks.

Analysis on the edge is just picking up momentum and it's just the beginning of good things to come.

Today, cloud is omnipresent, but for large scale IoT deployment, a new model of computing is needed emerge where constant cloud connectivity is not essential. Most data will be processed at or near its point of origin to provide real-time response and will be handled on-site in that  moment. Running analytics on edge will save costs, refine machine learning on massive data sets -  that can be acted on at the edge.

In June 2017 at HPE Discover, customers were delighted to get an in-depth view of this solution.

HPE's continued investments in data management and analytics will deliver a steady stream of innovation. Customers can safely invest in HPE technologies and win.

HPE along with Intel is future proofing investments in data and analytics for hyper distributed environments. HPE has taken a new approach to analytics to provide the flexibility of processing and analyzing data everywhere - from right at the edge where data is generated for immediate action and for future analysis in the cloud at a central data center.

Customers are using IoT data to gain insight through analytics, both at the center and the edge of the network to accelerate digital transformation. With HPE Edgeline, one can take an entirely new approach to analytics that provides the flexibility of processing and analyzing data everywhere—at the edge and in the cloud, so it can be leveraged in time and context as the business needs to use it.

This technology was developed in direct response to requests from customers that were struggling with complexity in their distributed IoT environments. Customers, analysts and partners have embraced intelligent IoT edge and are using it in conjunction with powerful cloud-based analytics.

Analytics on the edge is a game changing approach to analytics that solves major problems for for businesses looking to transform their operations in the age of IoT. The HPE Vertica Analytics Platform now runs at the IoT edge on the Edgeline EL4000. This combination gives enterprises generating massive amounts of data at remote sites a practical solution for analyzing and generating insights.

Customers like CERN, FlowServe,  etc are using Edge analytics to expand its monitoring of equipment conditions such as engine temperature, engine speed and run hours to improve maintenance costs. Telecom services companies are pushing the edge with analytics to deliver 4G LTE connectivity throughout the country, regardless of the location of the business.


Closing Thoughts 


Benefits of centralized deep compute make sense—for traditional data. But the volume and velocity of IoT data has challenged this status quo. IoT data is Big Data. And the more you move Big Data, the more risk, cost, and effort you'll have to assume in order to provide end-to-end care for that data.

Edge computing is rebalancing this equation, making it possible for organizations to get the best of all worlds: deep compute, rapid insights, lower risk, greater economy, and more trust and security.


Wednesday, June 07, 2017

Looking into the future - Right through Automation & Artificial Intelligence



Its no secret that Innovation & creativity is the ultimate source of competitive advantage. But success of any innovative idea depends on a number of other factors: An energetic leadership, Market growth, Significant investments pool, and a real "can do" spirit of the team.

As I write this article, there are thousands of articles being published on web - which state how robots and Artificial Intelligence will replace humans in workforce.

  1. McDonald's is testing a new restaurant run completely by Robots 
  2. Robot lands a Plane. 
  3. Driverless Trucks will eliminate millions of jobs  
  4. Smart Machines will cause mass unemployment  
  5. Google's AI beats world Go champion in first of five matches  


Today's news is filled with the hype and fear of mass unemployment due to Automation. Today the hype cycle is almost at its zenith and in this background, I was asked to talk about what kind of jobs & employment opportunities will be there in future.

How automation will help mankind?


Automation is actually an innate feature of mankind. If we look into our past, we as a species have always come up with creative innovation to automate mundane tasks, and every time we did this, the civilization progressed by leaps and bounds.

The first ever automation was creation of canal system - thereby automating the transportation of water. This led to the rise of early civilizations in Indus River valley, Egypt, Babylon & China. Since then, civilization has been making a steady progress to automate simple, repetitive tasks.

Simple machines replaced human labor, trains & cars replaced horse drawn carts, Computers replaced clerks, and the list goes on and on.

From all our learning's, we know that if a task can be automated, it will be automated. There is no way a civilization will be able to stop automation. People have no pleasure in doing repetitive labor and the society will promote automation. Period!

And Yes, Today, People are being replaced by algorithms, machines and artificial intelligence.

What shall humans do?


As automation, artificial intelligence, machine learning and robotics grow in capability, humans doing simple, repetitive jobs will be pushed out of their jobs. So what will humans do?

The answer to this question can be found in history. When canals were invented, farmers found themselves with more time on their hands to increase the land area under cultivation. This led to more food production and this freed up people to create some of early classics of literature. Ramayana, Mahabaratha, Upanishads etc. were written during this time. People built massive temples, pyramids, palaces, forts, statues etc.

Industrial revolution also led to an explosion in arts - mainly paintings, writing of novels, poems, sculpture, building of palaces, classical music etc.

In 20th century and early 21 century, automation led to more creativity in terms of space travel, adventures, film making, music and new machines.

All this points towards only one direction. When humans are freed up from mundane tasks, they will use their free time to harness their creative potential.

Humans are innately creative. Machines, computers & Robots are not creative. Humans are far more creative and this creativity cannot be reduced into an algorithm & automated. This implies that the new generation of humans who are trained to be engineers, doctors, scientists and artists etc. will dream up new things to do, build and explore. Perhaps we may discover how to travel faster than light.


How to prepare for the new future?


The current education system is designed to create a workforce of yesterday, and people who do mundane, repetitive tasks and this education system will have to change first. We as a civilization will have to train the younger generation to be creative, develop expansive & divergent thinking. We need to nurture and flourish the creative side of humans and this will free up the younger generation to be creative and innovative.

Modern workplaces also have to change in a big way. Traditional hierarchical top down management system which pigeonhole people into narrow jobs need to be revamped. Silos need to be broken up and creative ideas must be fast tracked as quickly as possible. Businesses must be willing to take risks on new ideas. For example, Ford Motors recently fired its CEO - even after record breaking sales - because Ford as a business now needs a new business models - far from the one of building cars.


Cost of Failure - A major war & conflict


History also tells us the dark side of automation. Whenever automation ushered in a new era, lot of people had too much free time and when this was not utilized in a positive way, humans resorted to war and violence.

In fact, both world wars were in a way caused by industrial revolution. Industrialized European countries had surplus human labor and young population which had nothing much to do and the countries rushed headlong into a catastrophic war.

Today, we are seeing a massive surge in terrorism from people in Middle East and Pakistan. This is because their governments have failed to utilize their workforce in productive ways. Similarly, USA & China have increased their military spending in recent times - as they are not able to channel their enormous economic resources towards creative work.


Closing Thoughts 


We as a civilization are at the cusp of a new revolution - ushered by Automation & AI. Will this result in a golden era of creativity & innovation or will it result in a catastrophic war?

I cannot predict the future, but I know for sure that if we invest in building a creative & innovative society - we can usher in a golden era, else we are doomed for a devastating war. 

Friday, June 02, 2017

Managing Big data with Intelligent Edge



The Internet of Things (IoT) is nothing short of a revolution. Suddenly, vast numbers of intelligent sensors and devices are generating vast amounts of data that contain potentially game-changing information.

In traditional, data analytics, all the data is shipped to a central data warehouse for processing in order to get strategic insights, like all other Big data projects, tossing large amounts of data of varying types into a data lake to be used later.

Today, most companies are collecting data at the edge of their network : PoS, CCTV, RFID scanners, etc. IoT data being churned out in bulk by sensors in factories, warehouses, and other facilities. The volume of data generated on the edge is huge and transmitting this data to a central data center and processing it in a central data center turns out to be very expensive.

The big challenge for IT leaders is to gather insights from this data rapidly, while keeping costs under control and maintaining all security & compliance mandates.

The best way to deal with this huge volume of data is to process this data right at the edge - near the point where data generated.
 

Advantages of analyzing data at the edge  


To understand, lets consider a factory.  Sensors on a drilling machine that makes engine parts - generates hundreds of bits of data each second. Over time, there are set patterns of data. Data showing vibrations, for example - it could be an early sign of a manufacturing defect about to happen.

Instead of sending the data across a network to a central data warehouse - where it will be analyzed. This is costly and time consuming. By the time the analysis is completed and plant engineers are alerted, there may be several defective engines already manufactured.

In contrast, if this analysis was done right at the site, plant managers could have taken corrective action before defect occurs. Thus, processing the data locally at the edge lowers costs while increasing productivity.

Also keeping data locally improves security and compliance. As all IoT sensors - could potentially be hacked & compromised. If data from a compromised sensor makes its way to the central data warehouse, the entire data warehouse could be at risk. Avoiding data from traveling across a network prevents malware from wreaking the main data warehouse.  If all sensor data is locally analyzed, then only the key results can be stored in a central warehouse - this reduces cost of data management and avoid storing useless data.

In case of banks, the data at the edge could be Personally Identifiable Information (PII), which is bound by several privacy laws and data compliance laws, particularly in Europe.

In short, analyzing data on the edge - near the point where data is generated is beneficial in many ways:

  • Analysis can be acted on instantly as needed.
  • Security & compliance is enhanced.
  • Costs of data analysis are lowered.


Apart from these above mentioned obvious advantages, there are several other advantages:

1. Manageability:

It is easy to manage IoT sensors when they are connected to an edge analysis system. The local server that runs data analysis can also be used to keep track of all the sensors, monitor sensor health, and alert administrators if any sensors fail. This helps in handling a wide plethora of IoT devices used at the edge.

2. Data governance: 

It is important to know what data is collected, where it is stored and to where it is sent. Sensors also generate lots of useless data that can be discarded or compressed or eliminated. Having an intelligent analytic system at the edge - allows easy data management via data governance policies.

3. Change management: 

IoT sensors and devices also need a strong change management( Firmware, software, configurations etc.). Having an intelligent analytic system at the edge - enables all change management functions to be off loaded to the edge servers. This frees up central IT systems to do more valuable work.

Closing Thoughts


IoT presents a huge upside in terms of rapid data collection. Having an intelligent analytic system at the edge gives a huge advantage to companies - with the ability to process this data in real time and take meaningful actions.

Particularly in case of smart manufacturing, smart cities, security sensitive installations, offices, branch offices etc. - there is a huge value in investing in an intelligent analytic system at the edge.

As conventional business models are being disrupted. Change is spreading across nearly all industries, and organizations must move quickly or risk being left behind their faster moving peers. IT leaders should go into the new world of IoT with their eyes open to both the inherent challenges they face and the new horizons that are opening up.

Its no wonder that a large number of companies are already looking to data at the edge.

Hewlett Packard Enterprise makes specialized servers called Edgeline Systems - designed to analyze data at the edge.  

Thursday, June 01, 2017

6 Key Tools and Techniques for Taming Big Data



Using Big Data across the enterprise doesn't require massive investments in new IT systems. Many Big Data tools can leverage existing and commodity infrastructures, and cloud-based platforms are also an option. Let's take a look at some of the most important tools and techniques in the Big Data ecosystem.

1) Data governance. 

Data governance includes the rules for managing and sharing data. Although it's not a technology per se, data governance rules are enforced by technologies such as data management platforms.
"There's a lack of standards and a lack of consistency," explains Doug Robinson, executive director of the National Association of State CIOs (NASCIO). "There's certain data quality issues: Some of the data is dirty and messy and it's non-standardized. And that increasingly has made data sharing very difficult because you have language and syntax differences, the taxonomy on how information is represented.

... All that is problematic because there's no overarching data governance model or discipline in most states. Data governance isn't very mature in state government nor local governments today, and certainly not the federal government."

Data governance is critical to gaining buy-in from participating agencies for enterprise-wide data management. Before data sharing can begin, representatives of all participating agencies must work together to:


  • Discuss what data needs to be shared
  • Determine how to standardize it for consistency
  • Develop a governance structure that aligns with organizational business & compliance needs


2) Enterprise data warehouse. 

With an enterprise data warehouse serving as a central repository, data is funneled in from existing departmental applications, systems and databases.

Individual organizations continue to retain ownership, management and maintenance of their data using their existing tools, but the enterprise data warehouse allows IT to develop a single Big Data infrastructure for all agencies and departments. The enterprise data warehouse is the starting point for integrating the data to provide a unified view of each citizen.

3) Master data management (MDM) platforms. 

With data aggregated into an enterprise data warehouse, it can be analyzed collectively. But first it has to be synthesized and integrated, regardless of format or source application, into a master data file. MDM is a set of advanced processes, algorithms and other tools that:

  • Inspect each departmental data source and confirm its rules and data structures. Identify and resolve identity problems, duplicate record issues, data quality problems and other anomalies 
  • Ascertain relationships among data
  • Cleanse and standardize data 
  • Consolidate the data into a single master file that can be accessed by all participating organizations
  • Automatically apply and manage security protocols and data encryption to ensure accordance with privacy mandates


4) Advanced analytics and business intelligence.

High-performance analytics and business intelligence are the brains of the Big Data technology ecosystem, providing government centers of excellence with a comprehensive analytical tool set that leverages extensive statistical and data analysis capabilities. Through the use of complex algorithms, these platforms quickly process and deliver Big Data's insights. Functionality includes the ability to:

  • Mine data to derive accurate analysis and insights for timely decision-making
  • Create highly accurate predictive and descriptive analytical models Model, forecast and simulate business processes
  • Apply advanced statistics to huge volumes of data 
  • Build models that simulate complex, real-life systems


5) Data visualization. 

Data visualization tools are easy to use — often with point-and-click wizard-based interfaces — and they produce dazzling results. With simple user interfaces and tool sets, users of advanced business intelligence and visualization tools can easily:


  • Develop queries, discover trends and insights
  • Create compelling and dynamic dashboards, charts and other data visualizations 
  • Visually explore all data, discover new patterns and publish reports to the Web and mobile devices 
  • Integrate their work into a familiar Microsoft Office environment
     
6) Specialty analytics applications. 

Multiple analytics techniques can be combined to deliver insight into specialized areas such as:
Fraud, waste and abuse. By detecting sophisticated fraud, enterprises can stop fraud before payments are made, uncover organized fraud rings and gain a consolidated view of fraud risk.

Regulatory compliance. Analytics tools can help agencies quickly identify and monitor compliance risk factors, test various scenarios and models, predict investigation results, and reduce compliance risk and costs.

HR analytics. Hiring is critical to build capabilities quickly. Therefore it becomes important to hire employees who can meet its requirements and fit into its corporate culture. There in lies the challenge: "How to hire someone from outside - who has relevant knowledge needed in banking and who will fit in with the existing corporate culture." This challenge can be solved by using data analytics during the selection process.

Each BU will have several such tools and techniques that are important, but that can't be justified to create data silos. Breaking data silos, combined technology with analytics expertise, new organizational workflows and cultural changes to enable enterprise-wide data management.

Big Data as an Organizational Center of Excellence



Introduction - Managing Data Across Large Enterprise

Today, enterprises are looking for Integrating Data to Support Analytics Based Decision Making and are still facing massive challenges. The biggest challenges they have is that data is located in silos, and yet large volumes of data are being generated are still managed in silos.

Thanks to new technologies such as IoT, there is an abundance of data being generated. Enterprise information systems, Networks, Applications and Devices that churn out huge volumes of information are awash in Big Data.

But enterprises are unable to make best use of this data as their internal organization — the network of people & business process are operating in isolation and many analytics efforts that only take into account information from a single silo - that delivers results in a vacuum - thus prevent them from making better business decisions.

The best way to lower costs of managing big data and leveraging this data for actionable insights - is to have a  pan-enterprise Big Data strategy.  (Also see Getting Your Big Data Strategy Right)

The best way to solve this problem is to create Big Data as an organizational center of excellence. This special group that can cut across silos, take ownership of all data and create new opportunities for operational excellence with Big Data.

Big Data as an Organizational Center of Excellence


Managing all data across the entire organization will improve efficiencies and services while lowering costs of mining this data.

Organizational BU's can now use this centralized data for actionable insights that can help them make better business decisions.

As business leaders recognize a pan-enterprise Big Data effort provides much more meaningful insights because it's based on an integrated view of the business. Yet they are faced with the challenges of siloed information systems. The current IT implementations and business process prevent data sharing and access to data.

In this article, I will present a solution to address the challenges of data silos and data hoarding.  As a enterprise wide solution, I shall present a new CoE for Big Data - a solution for breaking down data silos and discuss the key benefits of the solution.

Big Data CoE will take ownership of all data and present an integrated data for data analysis improves performance across the global enterprise. Big Data CoE can thus deliver long-term return on investment (ROI), enabling business leaders to develop a solid data for making better Analytics Based Decisions.


Understanding Data Silos and the Pitfalls of Data Hoarding 


Before I dive into the details of Big Data CoE, let take a quick look at data silos and dangers of data hoarding.

In large, global enterprises, individual business units (often referred to as BU's) are notorious and pervasive. Data silos are born in this toxic environment - where individual BU budgets procurement processes create data silos.

Group IT resources which are often scarce and are often designated for specific BU functions.   Each individual BU has no financial incentive to share data and IT resources. As a result, data collect by each BU becomes siloed and data is hoarded in each BU's IT systems and is not shared.

This results in organizational deficiencies where the entire organization suffer from redundant systems and inefficient decision-making. Because enterprise information systems remain segregated, data is walled up in departmental databases and applications. With valuable assets trapped in silos, BU's are unable to leverage data to improve processes, workflow or service delivery.

While data silos are created by operational and technical challenges, data hoarding is a result of insular agency cultures that encourage autonomy and self-reliance, as well as stringent compliance mandates for securing data, especially in this era of risks from data leaks/breaches and liability lawsuits.

In this environment, "business data" becomes "OUR DATA." Data hoarding trumps openness and sharing.

The impact of data silos and data hoarding is quite devastating. Without data sharing across BU's, each BU maintains its own view of business and there is no holistic view of a consistent global business. There is no integration of relevant data and this leads to missed opportunities and wrongful expenses and wastage, delays in discovering fraud, waste & misuse of money.

In addition, critical decisions are made with partial data - that leads to unproductive staff and duplicated efforts - which leads to wastage. Budgets are drained by the cost of managing and maintaining complex and redundant information systems, applications and system interfaces.

Finally, data silos and data hoarding weaken security & compliance efforts. It's harder to ensure the security and privacy of information as it moves among computer systems and databases, and can lead to noncompliance with critical regulations. (PCI-DSS, SOX, etc)

A Holistic Model: Big Data CoE 


Envision a pan-enterprise model for managing Big Data as an organizational center of excellence, A competency center whose core focus is to manage Big Data across the organization and provide right set of tools & infrastructure for business analytics.

Big Data CoE is created with a common focus to manage data and develop new technologies and architectures to analyze big data for making better business decisions

When Big DataCoE model is applied to the enterprise, we can instantly see the following benefits:


  • Data is treated as an organizational asset.
    Treating data as an organizational asset, CoE develops and fosters a collaborative environment for users across BU's to meet and exchange ideas, discuss new projects and share best practices.

     
  • Data is managed separately from IT in terms of strategy, organization, resources, purchasing and deployment. This frees up enterprise IT from handling the challenges of Big Data. Data can reside in-house systems or on public clouds.

     
  • Distinct processes are developed for collecting, consolidating, managing, linking, securing, sharing, analyzing, archiving, publishing and governing data.

     
  • Analytical expertise is shared among individual departments, which relieves them of the burden of independently recruiting their own talent and developing unique solutions.

     
  • Data is aggregated, shared and analyzed using a single, enterprise-wide data platform. A unified system of enterprise data management and analytics tools ensures seamless integration of data repositories with analytic tools, provides a common user experience with access to all types of enterprise data and builds end user engagement in data-driven decision-making.


Unlike siloed data, an enterprise wide approach to data provides all BU's with a single version of the truth. With this integrated, holistic view, decision-making involves all relevant, consistent data, regardless of data ownership.

Creation of Big Data CoE is Transformative


The biggest benefit of Big Data CoE is that it transforms business operations.

Big Data CoE leads to business transformation:


  1. More efficient Enterprise. When used at the enterprise level, Big Data can reduce fraud, waste and misuse of funds, Enhance communication and coordination between BU's, Improve management of Big Data, and Identify key business trends across the entire enterprise.
     
  2. Faster & Better Decision Making. A complete view of each BU's data, CoE can offer the most appropriate data management & analytical services by identifying patterns that might otherwise be missed. They can also eliminate data processing errors and duplicative data entry. Provide consistent procedures and processes eliminates waste of valuable time and speed up decision making.
     
  3. Stronger compliance efforts. When data management is integrated, it makes it easier to implement data compliance mandates across organization. Entire enterprise data can be made secure and compliant even when it is shared across BU's.
     
  4. Cost reduction. More efficient data management, analytics & workflows, compliance, security and  service delivery - all lead to cost reductions. By consolidating data analytics efforts under a single CoE, additional savings can be realized because departments don't have to procure and manage their own systems or hire department-specific data scientists and analysts.


To realize the benefits of an enterprise approach to Big Data, Enterprises must adopt a comprehensive approach that leverages appropriate tools and techniques.

Closing Thoughts  

Big Data can provide tremendous business advantages by for improving business productivity,  business decision making and service delivery. But Big Data can only live up to its true potential only if analytics programs are implemented thoughtfully and skillfully.

The strategic use of Big Data and data analytics technologies and tools requires considerable innovation, creative thinking and leadership.

The "silo mentality" has to be broken up and data needs to be shared across the enterprise as a common asset. Having a CoE manage all of Big Data allows enterprises to holistically manage, share and leverage data for faster decision making and service delivery.

Big Data CoE helps enterprises and its BU's to rethink and retool the way they collect, manage, archive and use data. CoE will enable Bus to work together and share information - this leads to better decision-making, faster service delivery and develop an enterprise wide approach to managing and using Big Data.