Web Snippets: Data Representation in RDD

Friday, February 9, 2018

Data Representation in RDD

Spark has 3 data representation

RDD(Resilient Distributed Database)

Is a collection of elements, that can be divided across multiple nodes in a cluster for parallel processing.
It is also fault tolerant collection of elements, which means it can automatically recover from failures.
Is immutable, we can create RDD once but can’t change it.

2.Dataset:

It is also a distributed collection of data.
A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.).
Dataset API is only available in Scala and Java. It is not available in Python and R.

DataFrame:

Is a distributed collection of data organized into named columns.
It is conceptually equivalent to a table in a relational database or a data frame.
It is mostly used for structured data processing.
In Scala, a DataFrame is represented by a Dataset of Rows.
A DataFrame can be constructed by wide range of arrays for example, existing RDDs, Hive tables, database tables.

History of Spark API

The snapshot shows the history of dataframes.

7 comments:

UnknownMay 14, 2018 at 1:01 AM
Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.

Data Science Training in Bangalore

ReplyDelete
Replies
anushyaJune 24, 2019 at 10:18 PM
the above content you shared is very useful and the way of presentation is easy to understand.
Selenium Training in Bangalore
Selenium Course in Bangalore
AWS Training in Bangalore
Devops Training in Bangalore
Java Training in Bangalore
Data Analytics Training in Bangalore
Digital Marketing Training in Bangalore
Python Course in Bangalore
ReplyDelete
Replies
Anjali SivaSeptember 13, 2019 at 5:32 AM
Such an amazing blog with new updates. I love to learn more about this topic. Waiting for more like this.
Microsoft Dynamics crm Training in Chennai
Microsoft Dynamics Training in Chennai
Microsoft Dynamics crm Training institutes in Chennai
Salesforce Training in Chennai
AngularJS Training in Chennai
Tally course in Chennai
ccna course in Chennai
Microsoft Dynamics crm Training in Velachery
Microsoft Dynamics crm Training in T Nagar
Microsoft Dynamics crm Training in OMR
ReplyDelete
Replies
AkanshyaJanuary 16, 2020 at 10:58 PM
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
Data Science Training in Hyderabad
Data Science course in Hyderabad
Data Science coaching in Hyderabad
Data Science Training institute in Hyderabad
Data Science institute in Hyderabad
ReplyDelete
Replies
SowmyaJanuary 16, 2020 at 11:01 PM
very informative article post. much thanks again
Data Science Training in Hyderabad
Data Science course in Hyderabad
Data Science coaching in Hyderabad
Data Science Training institute in Hyderabad
Data Science institute in Hyderabad
ReplyDelete
Replies
deviJuly 27, 2020 at 12:29 PM
Excellent blog with lots of information, keep sharing. I am waiting for your more posts like this or related to any other informative topic.Amazing web journal I visit this blog it's extremely marvelous. Interestingly, in this blog content composed plainly and reasonable. The substance of data is educationalData Science Training In Chennai

Data Science Online Training In Chennai

Data Science Training In Bangalore

Data Science Training In Hyderabad

Data Science Training In Coimbatore

Data Science Training

Data Science Online Training

ReplyDelete
Replies
CheaterSykoSeptember 1, 2020 at 4:00 AM
The primary thought of website streamlining is to increment unpaid guests to your site through Web optimization URLs or internet searcher well disposed URLs. tor links directory
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)