Tuesday, January 24, 2017

About Big data


        
                                   


       BigData

    • Term used to describe large volume of data.Both structured and unstructured data that include a business on day to day basis.
    • Can be analyzed for insights that lead to better decisions and strategic business moves.
    • Is in existence from many years. Due to cheap hardware and open source solution to the problem and communities it is getting popular
   

        use case :
      • Machine break down before failure
      • Analyzing data for healthcare studies
      • Prevent fraudulent activities for credit card

      Hadoop

      • Open source implementation for googles map reduce using hdfs 
      • Data can be stored or appended cannot be updated
      • Each node would have 3 copies ( 2 copies are backups)

      Apache spark

                     Speed
      • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk (Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing.)
                     Ease of Use
      • Write applications quickly in Java, Scala, Python, R ( Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells )
                     Generality
      • Combine SQL, streaming, and complex analytics ( Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.)
                     Runs Everywhere
      • Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3 ( You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.)

     MapReduce

                          To understand  map reduce.lets consider the example of analyzing the cellphone                       market used in Time Square building. Note there could be atleast 6 kind of providers like                     apple, android, windows etc


    Map  
    • First we need to start the process with bunch of people going to each floor and collecting the data from each individuals in the floor
    • They would then drop these in the message box in the each floor
    • This would be collected for analyzing in the main office floor
              Reduce
    • We collect the data from all the message box and then start entering these data into an excel file


        

                     
        

                      
  

1 comment:

  1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

    Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
    Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai


    The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

    ReplyDelete