Showing posts with label Spark MLlib. Show all posts
Showing posts with label Spark MLlib. Show all posts

Friday, February 9, 2018

Modules in Apache Spark





Spark SQL

  • Is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    • DataFrames 
      • Is a distributed collection of data organized into named columns. 
      • It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. 
      • DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.
      • DataFrame API is available in Scala, Java, and Python.

Labels