Steps explained
1) We write the program in scala, java or python and submit the application to the spark cluster2) Spark cluster is made up of multiple systems
3) One of these machine is assigned as the co-ordinator
Co-Ordinator
4) Coordinator recieves the task and breaks it into discrete tasks.
5) These tasks along with the data that they work on are assigned to worker machines to be executed
7) Every worker machine might be assigned the task of performing the same transformation on diffrent set of data.
Architecture consists of 2 components
Driver
Coordinator process that executes the user program
- Analyses the program and breaks it into discrete tasks that can be performed in a distributed manner
- Verifies the clusters and assigns or schedules task on these nodes (executors)
- Runs in its own Java process
- Only one coordinator for spark application
Executor
The worker process responsible for running individual jobs
- Receives the tasks they have to execute from the driver program and runs the job in a Spark job in different nodes in the cluster
- Provide in-memory storage for RDD so that tasks run close to the data
- Each executor runs in its own Java process
Cluster Manager
- Launches the driver and executor programs
- Pluggable and the built-in one is called the standalone cluster manager
Ex: Yarn can be plugged in
Driver and Executor together makes up the spark applicationReceiver
- Task which collects input data from different sources
- Spark allocates a receiver for each input source
- Special task that run on the executors
Note
- Task run within executors
- Collect data from input source and save them as RDDs in memory, so that spark can replicate the collected data to another executor for fault tolerance
- If we have 2 sources then 2 exectors would be assigned for the streaming data
Awesome blog with lots of information. It is really helpful for all.
ReplyDeleteAngularJS Training in Chennai
AngularJS course in Chennai
Angular 6 Training in Chennai
ReactJS Training in Chennai
PHP Training in Chennai
Web Designing course in Chennai
Tally course in Chennai
AngularJS Training in Velachery
AngularJS Training in T Nagar
AngularJS Training in OMR
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai
DeleteSpring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Spring Training in Chennai
The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
it ends up being significantly less complex to make exact future checks using gigantic proportions of data Data Analytics Course in Bangalore
ReplyDelete
ReplyDeleteReally awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog.
Software Testing Training in Chennai
Software Testing Training in Bangalore
Software Testing Course in Coimbatore
Software Testing Training in Madurai
Software Testing Training Institute in Bangalore
Software Testing Course in Bangalore
Testing Course in Bangalore
Ethical hacking course in bangalore
I really enjoyed this article. I need more information to learn so kindly update it.
ReplyDeleteSalesforce Training in Chennai
salesforce training in bangalore
Salesforce Course in bangalore
best salesforce training in bangalore
salesforce institute in bangalore
salesforce developer training in bangalore
Big Data Course in Coimbatore
Python Training in Bangalore
salesforce training in marathahalli
salesforce institutes in marathahalli