Steps explained
1) We write the program in scala, java or python and submit the application to the spark cluster2) Spark cluster is made up of multiple systems
3) One of these machine is assigned as the co-ordinator
Co-Ordinator
4) Coordinator recieves the task and breaks it into discrete tasks.
5) These tasks along with the data that they work on are assigned to worker machines to be executed
7) Every worker machine might be assigned the task of performing the same transformation on diffrent set of data.
Architecture consists of 2 components
Driver
Coordinator process that executes the user program
- Analyses the program and breaks it into discrete tasks that can be performed in a distributed manner
- Verifies the clusters and assigns or schedules task on these nodes (executors)
- Runs in its own Java process
- Only one coordinator for spark application
Executor
The worker process responsible for running individual jobs
- Receives the tasks they have to execute from the driver program and runs the job in a Spark job in different nodes in the cluster
- Provide in-memory storage for RDD so that tasks run close to the data
- Each executor runs in its own Java process
Cluster Manager
- Launches the driver and executor programs
- Pluggable and the built-in one is called the standalone cluster manager
Ex: Yarn can be plugged in
Driver and Executor together makes up the spark applicationReceiver
- Task which collects input data from different sources
- Spark allocates a receiver for each input source
- Special task that run on the executors
Note
- Task run within executors
- Collect data from input source and save them as RDDs in memory, so that spark can replicate the collected data to another executor for fault tolerance
- If we have 2 sources then 2 exectors would be assigned for the streaming data
Awesome blog with lots of information. It is really helpful for all.
ReplyDeleteAngularJS Training in Chennai
AngularJS course in Chennai
Angular 6 Training in Chennai
ReactJS Training in Chennai
PHP Training in Chennai
Web Designing course in Chennai
Tally course in Chennai
AngularJS Training in Velachery
AngularJS Training in T Nagar
AngularJS Training in OMR
it ends up being significantly less complex to make exact future checks using gigantic proportions of data Data Analytics Course in Bangalore
ReplyDelete
ReplyDeleteReally awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog.
Software Testing Training in Chennai
Software Testing Training in Bangalore
Software Testing Course in Coimbatore
Software Testing Training in Madurai
Software Testing Training Institute in Bangalore
Software Testing Course in Bangalore
Testing Course in Bangalore
Ethical hacking course in bangalore