Web Snippets: Map Reduce Data Flow

Friday, January 26, 2018

Map Reduce Data Flow

Pre loaded local input data and Mapping

MapReduce inputs typically come from input files loaded onto our processing cluster in HDFS. These files are evenly distributed across all the nodes.
Running a MapReduce program involves running these mapping tasks across all the nodes in our cluster.
Each of these mapping tasks are equivalent (No mappers have particular identity associated with them ). Therefore any mapper can process any input file.
Each mapper loads the set of file local to that machine and process them.

Intermediate data from the mappers and shuffling

When mapping phase is completed. the intermediate (key,value) pairs must be exchanged between machines to send all values with the same key to a single reducer

Reducing process

The reduce task are spread across the same nodes across the same nodes across the clusters.This is the only task in MapReduce.
Individual map task do no exchange information with one another, nor they are aware of one anoothers'existence
The user never explicitly marshals information from one machine to another; all data transfer is handled by the Hadoop MapReduce platform itself, guided implicitly by the different keys associated with values. This is a fundamental element of Hadoop MapReduce's reliability.

Note:

If nodes in the cluster fail, tasks must be able to be restarted. If they have been performing side-effects, e.g., communicating with the outside world, then the shared state must be restored in a restarted task. By eliminating communication and side-effects, restarts can be handled more gracefully.

Word Count

In the example below we can see the data flow for word count

Note: Combiner is a semi-reducer in mapreduce. This is an optional class which can be specified in mapreduce driver class to process the output of map tasks before submitting it to reducer tasks.

6 comments:

UnknownApril 29, 2018 at 10:50 PM
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

Hadoop Training in Bangalore
ReplyDelete
Replies
UnknownOctober 12, 2018 at 9:16 AM
The data scientist are become very common because know every people knows that they can solve their all type of data problems by hiring the data scientist. Activewizards blog is the best place form where you can get the best data scientist for you on very cheap and affordable rate.
ReplyDelete
Replies
Sadhana RathoreMarch 18, 2019 at 5:07 AM
The given information was excellent and useful. This is one of the excellent blog, I have come across. Do share more.
R Training in Chennai
R Programming Training in Chennai
Data Science Course in Chennai
Data Science Training in Chennai
Data Science Training in Anna Nagar
Machine Learning Course in Chennai
Machine Learning Training in Chennai
R Programming Training in Chennai
ReplyDelete
Replies
RajuMarch 18, 2019 at 11:47 PM
Thanks for your post which gather more knowledge about the topic. I read your blog everything is helpful and effective.
Hadoop training in chennai
Big data training in chennai
Big data course in chennai
Big data training in velachery
Big data analytics courses in chennai
Big data analytics training in chennai
Big data analytics training in Anna Nagar
Big data training in chennai anna nagar
ReplyDelete
Replies
sheela rajeshApril 9, 2019 at 9:50 PM
This blog is full of Innovative ideas.surely i will look into this insight.please add more information's like this soon.
Hadoop Training in Chennai
Big data training in chennai
big data training in velachery
JAVA Training in Chennai
Python Training in Chennai
Software testing training in chennai
Hadoop training in chennai
Big data training in chennai
big data training in chennai anna nagar
ReplyDelete
Replies
nadiyaJuly 20, 2019 at 12:00 AM
Nice Article you have posted here. Thank you for giving this innovative information and
please add more in future.
Hadoop Admin Training in Chennai
Hadoop Admin Training Institute in Chennai
Xamarin Training in Chennai
Node JS Training in Chennai
Ionic Training in Chennai
Blockchain Training in Chennai
Hadoop Admin Training in OMR
Hadoop Admin Training in Porur
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)