Friday, February 9, 2018

Apache Spark RDD


RDD (RESILIENT DISTRIBUTED DATASETS)

  • Basic program abstraction in Spark
  • All operations are performed in memory objects
  • Collection of entities
  • It can be assigned to a variable and methods can be invoked on it.Methods return values or apply transformations on the RDDs

Characteristics of  RDD'S 

Partitioned:
  • Individual RDD's would have multiple RDD split across in the cluster
  • Allows us to process elements of an RDD in parallel
  • Data is stored in memory for each node in the cluster
Immutable:
  • Once created cannot be changed
  • Only two operations can be performed on an RDD
Resilient:
  • Can be reconstructed even if a node is crashed.Data held in RDD
  • Is not lost Fault tolerant


3 comments:

  1. Replies
    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai

      Python Training in Chennai Python Training in Chennai The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

      Delete
  2. The more information are open to energize a judicious model, the more exact it becomes. At the point when AI and AI can manage the mind-boggling lift partner data focuses,Data Analytics Course in Bangalore

    ReplyDelete