Friday, April 13, 2018

Active Data Management for Big Data Analytics

Better Way to Manage Big Data with Containers

Big Data analytics using Hadoop & Mapreduce is best done in containers instead of bare metal servers and dedicated servers. Running big data analytics on containers using Kubernetes is faster, and costs less.

Storing data in Software Defined Storage systems (using HDFS over Object Store) allows for shared storage and also allows for in-place analytics, this is more efficient than copying all the files over to local disks - thus saving time & getting faster time to insights.

Also running on containers, allows IT to offer BigData-as-a-Service model as the entire infrastructure can be managed with existing cloud management tools, and this simplifies & reduces time for new  deployments.

Business Challenges & Technology Requirements for IoT

Sunday, April 08, 2018

Data Storage Tiers and HPE Solutions

Data centered economy implies that organizations need to store vast volumes of data, and this data has to be stored in an economic way - so as to get best performance/cost ratios. This implies creating multi-tier data storage systems, and having automated data management systems - such as HPE DMF solutions.

Automating Data management system over multiple tiers and building a cost effective SDS  systems is the way forward.

Today, HPE offers several SDS solutions for each data storage tiers.

Tier-1: VMWARE vSAN, HPE Simplivity & HPE VSA 
Tier-2: Lusture SDS with ZFS for scaleout NAS storage 
Tier-3: Scality Ring Object Store
Tier-4: HDD based RDX Cartridges for long term off-site secure backup

Tier 1 : This is frequently accessed storage, best build on SSD drives. NVMe SSD drives offer highest IOPS and throughput, SAS SSD offers highest capacity - while offering very high levels of IOPS performance.

Tier-2 :  Frequently used data, often files used by individuals. These files are best stored in a scale out NAS system. HPE offers Lusture ZFS built on Apollo 4500 servers - which offers infinite storage capability and at low cost solution when compared to dedicated NAS arrays.

Tier-3 :  Archived Object Store data, used infrequently but has to be stored for business purposes. HPE offers Scality Ring Object Store solution built on Apollo 4500 servers - which offers infinite storage capability and at very low cost solution  - which does not need backup, as data is replicated across multiple datacenters. 

Tier-4 : Backup Data stored in secure off line, off site location. Historically, tape storage was used for this backup, but with low cost HDD and very high reliability of Hard Drives, companies can use HDD for backup storage. HPE offers FLX HD cartridge solutions for long term data backup and off site archival.

Thursday, April 05, 2018

Data Preparation Process

Data preparation is the first step in modern data analytics and BI, data science, and data integration.

Data preparation takes more than 60% of the data analytics time. With business is demanding faster time to insight to remain competitive, analytics is becoming more pervasive  across the enterprise and those insights are being derived from larger numbers of diverse data sources, both internal and external to the enterprise, with varying degrees of trustworthiness. This increases complexity.

Data preparation processes reduces time to insight for analytics & is the first step for data analytics.