Friday, February 9, 2018

Data Representation in RDD


Spark has 3 data representation


  1. RDD(Resilient Distributed Database) 

    • Is a collection of elements, that can be divided across multiple nodes in a cluster for parallel processing. 
    • It is also fault tolerant collection of elements, which means it can automatically recover from failures. 
    • Is immutable, we can create RDD once but can’t change it.



       2.Dataset: 
    • It is also a distributed collection of data. 
    • A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.). 
    • Dataset API is only available in Scala and Java. It is not available in Python and R.
  • DataFrame: 
    • Is a distributed collection of data organized into named columns. 
    • It is conceptually equivalent to a table in a relational database or a data frame. 
    • It is mostly used for structured data processing. 
    • In Scala, a DataFrame is represented by a Dataset of Rows. 
    • A DataFrame can be constructed by wide range of arrays for example, existing RDDs, Hive tables, database tables.

History of Spark API

The snapshot shows the history of dataframes.



11 comments:

  1. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.

    Data Science Training in Bangalore

    ReplyDelete

  2. Thanks for your extraordinary blog. Your idea for this was so brilliant. This would provide people with an excellent tally resource from someone who has experienced such issues. You would be coming at the subject from a different angle and people would appreciate your honesty and frankness. Good luck for your next blog!
    Tally ERP 9 Training
    tally classes
    Tally Training institute in Chennai
    Tally course in Chennai
    seo training classes
    seo training course
    seo training institute in chennai
    seo training institutes
    seo courses in chennai
    seo institutes in chennai
    seo classes in chennai
    seo training center in chennai

    ReplyDelete
  3. Nice blog, it’s so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job. Thank you for this wonderful sharing with us. Keep Sharing.
    Digital Marketing Course In Kolkata
    Web Design Course In Kolkata

    ReplyDelete
  4. Get real time project based and job oriented Salesforce training India course materials for Salesforce Certification with securing a practice org, database terminology, admin and user interface navigation and custom fields creation, reports & analytics, security, customization, automation and web to lead forms.  

    ReplyDelete
  5. I am so happy to found your blog post because it's really very informative. Please keep writing this kind of blogs and I regularly visit this blog. Have a look at my services.
    I have found this Salesforce training in India worth joining course. Try this Salesforce training in Hyderabad with job assistance. Join Salesforce training institutes in ameerpet with certification. Enroll for Salesforce online training in hyderabad with hands on course.  

    ReplyDelete
  6. Great post! I am actually getting ready to across this information, It’s very helpful for this blog. Also great with all of the valuable information you have Keep up the good work you are doing well.
    CRS Info Solutions Salesforce training for beginners   

    ReplyDelete
  7. Myself so glad to establish your blog entry since it's actually quite instructive. If it's not too much trouble continue composing this sort of web journals and I normally visit this blog. Examine my administrations.
    Go through these Salesforce Lightning Features course. Found this Salesforce CRM Using Apex And Visualforce Training worth joining. Enroll for SalesForce CRM Integration Training Program and practice well.   

    ReplyDelete