Web Snippets: About Big data

BigData

Term used to describe large volume of data.Both structured and unstructured data that include a business on day to day basis.
Can be analyzed for insights that lead to better decisions and strategic business moves.
Is in existence from many years. Due to cheap hardware and open source solution to the problem and communities it is getting popular

use case :

Machine break down before failure
Analyzing data for healthcare studies
Prevent fraudulent activities for credit card

Hadoop

Open source implementation for googles map reduce using hdfs
Data can be stored or appended cannot be updated
Each node would have 3 copies ( 2 copies are backups)

Apache spark

Speed

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk (Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing.)

Ease of Use

Write applications quickly in Java, Scala, Python, R ( Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells )

Generality

Combine SQL, streaming, and complex analytics ( Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.)

Runs Everywhere

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3 ( You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.)

MapReduce

To understand map reduce.lets consider the example of analyzing the cellphone market used in Time Square building. Note there could be atleast 6 kind of providers like apple, android, windows etc

Map

First we need to start the process with bunch of people going to each floor and collecting the data from each individuals in the floor
They would then drop these in the message box in the each floor
This would be collected for analyzing in the main office floor

Reduce

We collect the data from all the message box and then start entering these data into an excel file

Web Snippets

Labels

Tuesday, January 24, 2017

About Big data

BigData

Hadoop

Apache spark

MapReduce

No comments:

Post a Comment

Labels

Blog Archive