Showing posts with label Spark Sql. Show all posts
Showing posts with label Spark Sql. Show all posts

Tuesday, July 10, 2018

Spark samples (RDD, DataFrames,DSL)



SHARK :THE BEGGING OF THE API 
  • SQL using Spark execution engine
  • Evolved into Spark SQL in 1.0
SCHEMA RDD
  • RDD with schema information
  • For unit testing and debugging Spark SQL
  • Drew attention by spark developers
  • Released as DataFrame API in 1.3

Monday, July 9, 2018

Spark samples (Spark SQL, Window functions , persist )

Image result for spark sql png

WRITE AS CSV
df_sample.write.csv("./spark-warehouse/SAMPLE.csv")

WRITE AS CSV WITH HEADER
df_sample.write.csv("./spark-warehouse/SAMPLE_5.csv",header=True) 


DISPLAY All COLUMNS 
#Load csv as dataframe
data = spark.read.csv("./spark-warehouse/LOADS.csv", header=True)

#Register temp viw
data.createOrReplaceTempView("vw_data")

#load data based on the select query
load = spark.sql("Select * from vw_data limit 5")
load.show()


Friday, February 9, 2018

Modules in Apache Spark





Spark SQL

  • Is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
    • DataFrames 
      • Is a distributed collection of data organized into named columns. 
      • It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. 
      • DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.
      • DataFrame API is available in Scala, Java, and Python.

Labels