How much data does google deal?
Limitations of Map reduce
Streaming data:
Continuous flow of information from one or more sources is called Streaming data
Stream processing:
The mutation/transformation that we perform on these data are called
stream processing
Spark streaming
Spark Streaming Module
Ex: Log messages, tweets, GPS location information (latitude and logngitute)
Note:
1)We need to process individual entities or group of entities
Ex mood on twitter
2) Once we processed the entities we transform it to desired resultant form
3) This might be stored in a reliable storage or passed on to another applicatin
or acted on a certain way
Output
Trigger an alert, Show trending graphs, Display route on the map
- Stores about 15 exabytes ( 1000000000000000000B )of data
- Process 100 petabytes of data per day
- 60 trillion pages are indexed
- 1 billion google search users per month
Limitations of Map reduce
- Entire Map reduce job is a batch processing job
- Does not allow real time processing of the data
Streaming data:
Continuous flow of information from one or more sources is called Streaming data
Stream processing:
The mutation/transformation that we perform on these data are called
stream processing
Spark streaming
- It is able to work on streaming data and perform stream processing on the stream
- Dealing with real data in real time
- Better alternative to Hadoop when manipulating data streams
- Extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
- Can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis or TCP sockets can be processed using complex algorithms expressed with highlevel functions like map, reduce, join and window.
- Processed data can be pushed out to filesystems, databases, and live dashboards.
- Streams of data are made up of discrete entities
- Streams arrives at a input and needs to be processed at real time
Ex: Log messages, tweets, GPS location information (latitude and logngitute)
Note:
1)We need to process individual entities or group of entities
Ex mood on twitter
2) Once we processed the entities we transform it to desired resultant form
3) This might be stored in a reliable storage or passed on to another applicatin
or acted on a certain way
Output
Trigger an alert, Show trending graphs, Display route on the map
No comments:
Post a Comment