Tuesday, November 28, 2017

Big Data Warehouse Reference Architecture


Data needs to get into Hadoop in some way and needs to be securely accessed through different tools. There is a massive choice of tools for each component, dependent on the choice of Hadoop distribution selected – each having their own versions of the tools, but, nonetheless providing the same functionality.

Just like core Hadoop, the tools that ingest and access data in Hadoop can scale independently.  For example, a Spark cluster, flume cluster, or kafka cluster

Each specific tool has it's own infrastructure requirements.

For example: Spark requires more memory and processor power, but is less dependent on hard disk drives

Hbase doesn't require as many cores within the processor but requires more servers and faster non-volatile memory such as SSD and NVMe based flash.    

No comments: