Friday, February 9, 2018

Spark Architecture


Steps explained

1) We write the program in scala, java or python and submit the application to the spark cluster
2) Spark cluster is made up of multiple systems
3) One of these machine is assigned as the co-ordinator



Co-Ordinator
4) Coordinator recieves the task and breaks it into discrete tasks.
5) These tasks along with the data that they work on are assigned to worker machines to be executed


6) Every RDD has the potential to have millions of records
7) Every worker machine might be assigned the task of performing the same transformation on diffrent set of data.

Architecture consists of 2 components




Driver
Coordinator process that executes the user program
  • Analyses the program and breaks it into discrete tasks that can be performed in a distributed manner
  • Verifies the clusters and assigns or schedules task on these nodes (executors)
  • Runs in its own Java process
  • Only one coordinator for spark application

Executor
 The worker process responsible for running individual jobs
  • Receives the tasks they have to execute from the driver program and runs the job in a Spark job in different nodes in the cluster
  • Provide in-memory storage for RDD so that tasks run close to the data
  • Each executor runs in its own Java process

Cluster Manager

  • Launches the driver and executor programs
  • Pluggable and the built-in one is called the standalone cluster manager
 Ex: Yarn can be plugged in
Driver and Executor together makes up the spark application


Receiver
  • Task which collects input data from different sources
  • Spark allocates a receiver for each input source
  • Special task that run on the executors 

Note 

  1. Task run within executors
  2. Collect data from input source and save them as RDDs in memory, so that spark can replicate the collected data to another executor for fault tolerance  
  3. If we have 2 sources then 2 exectors would be assigned for the streaming data








5 comments:

  1. Replies
    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai

      Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.

      Spring Training in Chennai

      The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

      Delete
  2. it ends up being significantly less complex to make exact future checks using gigantic proportions of data Data Analytics Course in Bangalore

    ReplyDelete