Friday, January 26, 2018

HDFS Rack Awareness


What is rack awareness?


  • In a large cluster of Hadoop, in order to improve the network traffic while reading/writing HDFS file, namenode chooses the datanode which is closer to the same rack or nearby rack to Read/Write request. 
  • Namenode achieves rack information by maintaining the rack id’s of each datanode. This concept that chooses closer datanodes based on the rack information is called Rack Awareness in Hadoop.
  • Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. 
  • Default Hadoop installation assumes that all data nodes belong to the same rack.

Why do we need rack awareness?


  • To improve data high availability and reliability.
  • Improve the performance of the cluster.
  • To improve network bandwidth.
  • Avoid losing data if entire rack fails though the chance of the rack failure is far less than that of node failure.
  • To keep bulk data in the rack when possible.
  • An assumption that in-rack id’s higher bandwidth, lower latency.

No comments:

Post a Comment