Web Snippets: Overview of Hadoop Map Reduce

Friday, March 3, 2017

Overview of Hadoop Map Reduce

Apache Hadoop is an open-source software framework used for distributed storage and processing of big data sets using the MapReduce programming model.
It consists of computer clusters built from commodity hardware.
All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Map

Involved in fetching parallel information from all the clusters
Output would be key value pairs
Every map process the data that is present the given machine

Reduce

Works on the data fetched during the map process
Usual computation would be average, sum based on the requirement
Step to combine the intermediate results

Combiner

Are very similar to the reduce phase. It works on the mapper output before it goes to the reducer.
Ex we have n mapper nodes , we would have 10 combiners
Output of the combiner is sent to the reducer
Takes the load of the reducer to help the process make more efficient

As a developer we have to write functions ( only to 2 functions )
Map and Reduce. Hadoop does the rest behind the scene

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)