Friday, February 9, 2018

Apache Spark RDD


RDD (RESILIENT DISTRIBUTED DATASETS)

  • Basic program abstraction in Spark
  • All operations are performed in memory objects
  • Collection of entities
  • It can be assigned to a variable and methods can be invoked on it.Methods return values or apply transformations on the RDDs

Characteristics of  RDD'S 

Partitioned:
  • Individual RDD's would have multiple RDD split across in the cluster
  • Allows us to process elements of an RDD in parallel
  • Data is stored in memory for each node in the cluster
Immutable:
  • Once created cannot be changed
  • Only two operations can be performed on an RDD
Resilient:
  • Can be reconstructed even if a node is crashed.Data held in RDD
  • Is not lost Fault tolerant


No comments:

Post a Comment