Friday, February 9, 2018

Flavors of Hadoop Distribution


Hortonworks 

  • It is very similar to the Apache Hadoop distribution. 
  • We can use Azure blob storage as the default DFS. With that, we can start the cluster only when we need to compute power. 
  • We can bring data to the storage through REST API, or SDKs in different languages rest of the time. Therefore we can create a cluster that has the required size when we want the computation. There is a lot of flexibility but we will lose collocality (which is mainly important in the first map phase).
Advantages 

  • Hortonworks supports the Microsoft Windows operating system while other vendors support the Linux operating system.
  • Hive can be made faster through new Stinger project.
  • They enhance the usability of the Hadoop platform.

Cloudera

The Cloudera only supports blob storage as a cold archive. Hence, it is more difficult to create various clusters on the same storage. The user has to save the data to blob storage explicitly. So that data can be accessed after shutting down.

Advantages
  • It can add new services to a running Hadoop cluster.
  • It supports the feature of managing the multi-clusters.
  • The CDH allows the creation of node groups in a Hadoop cluster. The configuration is different as the users don’t have to use the same configuration throughout the Hadoop cluster.
  • Depend on HDFS. They can go with the Data Node and Name Node architecture for splitting up where the data processing is done.

MapR

There is complications for making the data available offline due to an absence of wasb driver. And it doesn’t support single vm or HD insight like Cloudera and Hortonworks.

Advantages


  • The only distribution which has no java dependencies with Pig, Hive, and Sqoop because it relies on MapRFS.
  • MapR is the Hadoop distribution with enhancements that make it more user-friendly, faster and dependable.
  • It supports multi-node direct access NFS. The users of the distribution can mount MapR file system over NFS. So this allows the applications to access Hadoop data in a traditional way.
  • MapR provides full data protection, simplicity without a single point of failure.
  • It is one of the fastest Hadoop distributions.



No comments:

Post a Comment