HDFS Rack Awareness
What is rack awareness?
- In a large cluster of Hadoop, in order to improve the network traffic while reading/writing HDFS file, namenode chooses the datanode which is closer to the same rack or nearby rack to Read/Write request.
- Namenode achieves rack information by maintaining the rack id’s of each datanode. This concept that chooses closer datanodes based on the rack information is called Rack Awareness in Hadoop.
- Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster.
- Default Hadoop installation assumes that all data nodes belong to the same rack.
Why do we need rack awareness?
- To improve data high availability and reliability.
- Improve the performance of the cluster.
- To improve network bandwidth.
- Avoid losing data if entire rack fails though the chance of the rack failure is far less than that of node failure.
- To keep bulk data in the rack when possible.
- An assumption that in-rack id’s higher bandwidth, lower latency.
No comments:
Post a Comment