Web Snippets: Spark Architecture

Friday, February 9, 2018

Spark Architecture

Steps explained

1) We write the program in scala, java or python and submit the application to the spark cluster
2) Spark cluster is made up of multiple systems
3) One of these machine is assigned as the co-ordinator

Co-Ordinator

4) Coordinator recieves the task and breaks it into discrete tasks.
5) These tasks along with the data that they work on are assigned to worker machines to be executed

6) Every RDD has the potential to have millions of records
7) Every worker machine might be assigned the task of performing the same transformation on diffrent set of data.

Architecture consists of 2 components

Driver
Coordinator process that executes the user program

Analyses the program and breaks it into discrete tasks that can be performed in a distributed manner
Verifies the clusters and assigns or schedules task on these nodes (executors)
Runs in its own Java process
Only one coordinator for spark application

Executor
The worker process responsible for running individual jobs

Receives the tasks they have to execute from the driver program and runs the job in a Spark job in different nodes in the cluster
Provide in-memory storage for RDD so that tasks run close to the data
Each executor runs in its own Java process

Cluster Manager

Launches the driver and executor programs
Pluggable and the built-in one is called the standalone cluster manager

Ex: Yarn can be plugged in

Driver and Executor together makes up the spark application

Receiver

Task which collects input data from different sources
Spark allocates a receiver for each input source
Special task that run on the executors

Note

Task run within executors
Collect data from input source and save them as RDDs in memory, so that spark can replicate the collected data to another executor for fault tolerance
If we have 2 sources then 2 exectors would be assigned for the streaming data

3 comments:

Banu RaviSeptember 27, 2019 at 4:21 AM
Awesome blog with lots of information. It is really helpful for all.
AngularJS Training in Chennai
AngularJS course in Chennai
Angular 6 Training in Chennai
ReactJS Training in Chennai
PHP Training in Chennai
Web Designing course in Chennai
Tally course in Chennai
AngularJS Training in Velachery
AngularJS Training in T Nagar
AngularJS Training in OMR
ReplyDelete
Replies
marksonDecember 9, 2019 at 5:45 AM
it ends up being significantly less complex to make exact future checks using gigantic proportions of data Data Analytics Course in Bangalore
ReplyDelete
Replies
shreekaviMarch 3, 2020 at 3:50 AM

Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog.
Software Testing Training in Chennai
Software Testing Training in Bangalore
Software Testing Course in Coimbatore
Software Testing Training in Madurai
Software Testing Training Institute in Bangalore
Software Testing Course in Bangalore
Testing Course in Bangalore
Ethical hacking course in bangalore
ReplyDelete
Replies

Add comment

Web Snippets

Labels