Wednesday, February 18, 2015

Data is Data - Unified Data Lake is the game changer

For many companies today, Big Data analytics is a top priority. Leaders at these companies hope to derive new insights from all available data to improve decision making that impacts: enhances customer experience, improves productivity, lowers operational costs, and create new business opportunities.

In short companies want the moon!

Unfortunately, companies too often start their Big Data efforts in silos. Marketing has its own initiate, production has its own projects, finance works in another silo and so on. The discrete silo based approach will fail in the world of big data.

Many companies rush the implementation of big data projects and confuse Big Data technologies with tools like Hadoop, Pig, Python, and others without looking at the bigger picture. Many of these new projects fail to include existing data discovery and analytics tools.

The siloed approach leads to quick initial success but it opens up a series of much bigger challenges which eventually leads to re-architecting all the Big Data Projects while costs goes out of control. The main challenge is data growth. Success of Big Data analytics relies on massive volumes of data. The rapid data growth will quickly overwhelm the existing IT infrastructure in respective silos, forcing departments to replicate Big Data infrastructure.

Data used by businesses for decision making is exploding. Typically, data volumes double every 12 to 18 months. Use of unstructured data from: emails, logs, call logs, documents, files, images/videos and social media streams (twitter/Facebook etc)  are become mainstream in data analysis.  Moreover regulatory rules insist that data must be retained for a long period of time. This results in the huge volume of data that must be stored, protected and managed - which gets more expensive.

Many Big Data projects involve the use of vastly different data types: Unstructured, files, meta data, blob data, log files etc. For example, One project may use searching a database for pattern matching, another project uses word searches in different types of files, while another may use real time data from social media and match it with historical trending data sets etc. Each of these projects has a different requirement on data capture, extraction, and storage. Having a dedicated or separate data storage/management system is unviable.

Business analytics projects tend to focus on spotting trends by looking at the historical data and matching it with the current events. How well did this product sell in that store? How many customers who got this email clicked on an embedded link and completed a purchase or transaction? Which supplier's products contributed to the highest failure rates in a product line? These types of analytics is best answered by analyzing data that had been orderly collected and stored in a structured database or data warehouse and then enhanced with real time data.

In a dynamic business enviroment, Big Data analytics can help answer new questions rapidly. For example: Can information from social media tell me how customer sentiments are changing in response to the new advertisements? Can real-time customer interactions in the store help in inventory management? Can real-time transactional data help in fraud detection?

The answer to such questions can be quickly found out if the analytics tools have access to the required data, which has to come from both traditional sources and new sources.

A traditional silo based approach to Big Data analytics projects tend to get more expensive and management does not get the full value of the new insights they desired.

The best way to approach Big Data analytics is to take an approach that accommodates both rapid growth of data and uses existing analytics and tools, while leveraging existing infrastructure and then adding newer tools and analytics to the mix. Existing business analytics should continue to work along with new Big Data analytics. Existing corporate and business managers do not have the time or expertise to learn how to use all of the needed techniques and tools and will rely on existing work flows and will use new analytics as needed. In short, implementing Big Data analytics must not be disruptive.

Unified Data Lake is the game Changer

Creating a unified Data Lake which had ingest and hold both traditional data in existing data warehouses and newer data types should be the first step while embarking on a big data journey.

A unified Data Lake gives companies a choice of data extraction and analytics tools and does not lock workers into using old existing solutions, Older workflows can be easily integrated with newer Big Data analytics workflow.

A unified Data Lake allows new Big Data analytics solution can use  new technologies like Hadoop, Hive, Rails etc  to capture, store, and refine data, while providing enterprise class data security, protection and access control in a centralized, integrated way that data is accessible and easily managed.

A Unified Data Lake offers several benefits including:

  • Agility: Eliminating much of the strain on IT that was common with traditional silo approaches.
  • Simplicity: Allow consumption of data in any format, thus saving time & reducing errors
  • Flexibility: Allow for the use of different analytics techniques, mix of both old and new, which helps organizations see the data differently to ask new questions and derive new insights
  • Accessibility: Provide users with easy and secure access all their data. 

Also see: 

No comments: