Friday, February 9, 2018

Lineage of RDD


RDD tracking 

Every RDD keeps track of :

  1. where it came from ?
  2.  All transformation it took to reach it's current state

These steps are called Lineage/DAG of an RDD



Data Visualization

  • In Spark, a job is associated with a chain of RDD dependencies organized in a direct acyclic graph (DAG)
  • Dependency graph where every RDD knows its parent RDD and the transformation


Note: All transformation are in memory and none of the transformation are
applied till we access the results


Advantage of Lineage

  • Allows RDD's to be reconstructed when nodes crash.
  • We start from the source file. Apply all the transformation which are stored and recreate the RDD
  • Allows RDD's to be lazily instantiated (materialized) when accessing the results




3 comments:

  1. I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this Salesforce Training who are providing certification and job assistance. Salesforce certification in Noida

    ReplyDelete
  2. Great Article android based projects

    Java Training in Chennai Project Center in Chennai Java Training in Chennai projects for cse The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

    ReplyDelete
  3. This is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free. 리니지갤러리

    ReplyDelete