Hadoop version 2 came with a fundamental change to the architecture.The framework was divided into two. Mapreduce and Yarn
MapReduce: Responsible for what operations you want to perform on the data
YARN: Yet Another Resource Negotiator
- Determines and responsible for coordinating all the tasks running on all the nodes in the cluster
- Framework responsible for providing the computational resources which includes ( CPU, memory,etc) needed for application execution
- Assigns new task to the node based on the existing capacity. If nodes have failed and all the process in that nodes have stopped, it would assign new nodes for that task
- It is a better resource negotiator
YARN Components
:Is made up of two components
Resource manager:
- Runs on a single master node
- Schedules tasks across nodes
Node manager:
- Runs on all the other nodes
- Manages tasks on the individual node
- All process on a node are run within a container
- Its a logical container, logical unit for resources the the process needs - memory, CPU etc
- Is defined by resources
- Responsible for running any task assigned to it.It executes that application
- One node manager can have more then one containers
Note: When a new process is required to be spun off on a node the resource request for that process is made in the form of container
Application Master Process
- After a container has been assigned on a node manager, the resource manager(master process) starts an application process master within the container
- Responsible for performing the computation and processing the data
- In the case of Map Reduce the application master process will be a Mapper process or the reduce logic
- Responsible for determining if additional resources are required to complete the task. (If we have pending mapper or reduce jobs which needs to be run)
Note:
1) If more job needs to be run then the application master requests the resource manager running on the master node for additional resources (containers) . Now node manager requests containers for new mappers and reducers. This request would have cpu requirement, memory requirement etc.
2) Resource manager always scans and looks for new nodes which are available. Individual node manager would not have this information
Node manager and resource manager work together to accomplish parallel processing
ReplyDeleteThis is One of the good article. Thanks for sharing important article.
DevOps certification in Chennai
DevOps Training in Chennai
DevOps course in Chennai
Data science course in chennai
AWS training in chennai
AWS certification in chennai
Awesome informations that you have shared for us.I eagerly waiting for more updates in future.
ReplyDeleteHadoop Training in Chennai
Big data training in chennai
hadoop training in velachery
JAVA Training in Chennai
Python Training in Chennai
Selenium Training in Chennai
hadoop training in Annanagar
big data training in chennai anna nagar
Big data training in annanagar
Wonderful blog...!!! I like your post and I learn lot of info from your great post.
ReplyDeleteEmbedded System Course Chennai
Embedded System Courses in Chennai
Excel Training in Chennai
Corporate Training in Chennai
Tableau Training in Chennai
Oracle Course in Chennai
Oracle DBA Training in Chennai
Unix Training in Chennai
Power BI Training in Chennai
Placement Training in Chennai