Friday, February 9, 2018

Lineage of RDD

RDD tracking 

Every RDD keeps track of :

  1. where it came from ?
  2.  All transformation it took to reach it's current state

These steps are called Lineage/DAG of an RDD

Data Visualization

  • In Spark, a job is associated with a chain of RDD dependencies organized in a direct acyclic graph (DAG)
  • Dependency graph where every RDD knows its parent RDD and the transformation

Note: All transformation are in memory and none of the transformation are
applied till we access the results

Advantage of Lineage

  • Allows RDD's to be reconstructed when nodes crash.
  • We start from the source file. Apply all the transformation which are stored and recreate the RDD
  • Allows RDD's to be lazily instantiated (materialized) when accessing the results


  1. I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this Salesforce Training who are providing certification and job assistance. Salesforce certification in Noida

  2. Great Article android based projects

    Java Training in Chennai Project Center in Chennai Java Training in Chennai projects for cse The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

  3. This is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free. 리니지갤러리

  4. Thanks for the detailed article on this topic. I would like to see more such awesome articles from you. Also you can get the new and best features of GBWhatsapp which are coming in 2022- GBWhatsapp 2022 APK