Web Snippets: HDFS Rack Awareness

Friday, January 26, 2018

HDFS Rack Awareness

What is rack awareness?

In a large cluster of Hadoop, in order to improve the network traffic while reading/writing HDFS file, namenode chooses the datanode which is closer to the same rack or nearby rack to Read/Write request.
Namenode achieves rack information by maintaining the rack id’s of each datanode. This concept that chooses closer datanodes based on the rack information is called Rack Awareness in Hadoop.
Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster.
Default Hadoop installation assumes that all data nodes belong to the same rack.

Why do we need rack awareness?

To improve data high availability and reliability.
Improve the performance of the cluster.
To improve network bandwidth.
Avoid losing data if entire rack fails though the chance of the rack failure is far less than that of node failure.
To keep bulk data in the rack when possible.
An assumption that in-rack id’s higher bandwidth, lower latency.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)