Web Snippets: Apache Spark RDD

Friday, February 9, 2018

Apache Spark RDD

RDD (RESILIENT DISTRIBUTED DATASETS)

Basic program abstraction in Spark
All operations are performed in memory objects
Collection of entities
It can be assigned to a variable and methods can be invoked on it.Methods return values or apply transformations on the RDDs

Characteristics of RDD'S

Partitioned:

Individual RDD's would have multiple RDD split across in the cluster
Allows us to process elements of an RDD in parallel
Data is stored in memory for each node in the cluster

Immutable:

Once created cannot be changed
Only two operations can be performed on an RDD

Resilient:

Can be reconstructed even if a node is crashed.Data held in RDD
Is not lost Fault tolerant

2 comments:

sheela rajeshMarch 29, 2019 at 9:29 PM
Your blog is so inspiring for the young generations.thanks for sharing your information with us and please update
more new ideas.
JAVA Training in Chennai
JAVA Training in Velachery
Software testing training in chennai
Android Training in Chennai
Selenium Training in Chennai
Hadoop Training in Chennai
JAVA Training in Chennai
Java Training in Tnagar
ReplyDelete
Replies
marksonDecember 9, 2019 at 5:45 AM
The more information are open to energize a judicious model, the more exact it becomes. At the point when AI and AI can manage the mind-boggling lift partner data focuses,Data Analytics Course in Bangalore
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)