Web Snippets: Temperature of Big Data

Wednesday, February 21, 2018

Temperature of Big Data

What is data temperature?

It’s classifying data from hot to cold based on how frequently it is accessed.
Hot data is accessed most frequently and cold data is accessed infrequently.

Hot Data

Measurements in large-scale analytic environments consistently indicate that less than 20% of the data is accessed by more than 90% of the I/Os in an analytic environment. Such data belongs in memory so we can retrieve it very fast.

Cold Data

The other 80% of the data, which is accessed less than 10% of the time, can be thought of as cold data.
Putting cold data in memory does not make sense from an economic point of view, especially with large volumes of data. If we are talking about 100 gigabytes, then put it all in memory. But if we’re talking about 100 terabytes, it doesn’t make economic sense to put everything in memory

Optimize for Both Cost and Performance

The goal of good engineering is to optimize for both cost and performance.
Hot data, data that’s accessed very frequently, like the latest sales numbers, should be in memory. While memory costs more per terabyte for storage than electromechanical disk drives, it is also fast and is the lower cost per I/O infrastructure.
In contrast, data that’s relatively cold should be in the lower cost per terabyte storage provided by disk drives because the low cost per I/O does not matter so much for data that is accessed infrequently. Low cost is key for cold data so that you can store lots of it economically.
This is a big part of the design philosophy for “data lakes” used to capture “all” data forever in a big data environment.

3 comments:

SankarOctober 5, 2019 at 6:47 AM
Great Article
IEEE Projects for CSE in Big Data
Final Year Project Centers in Chennai

Java Training in Chennai
Java Training in Chennai
ReplyDelete
Replies
marksonDecember 9, 2019 at 11:28 PM
Utilizing prescient examination, associations can find and adventure patterns present inside information to recognize openings and dangers. Data Analytics Course in Bangalore
ReplyDelete
Replies
ACK ImagingApril 27, 2021 at 3:22 AM
ACK Imaging Remanufactured Ink and Toner Cartridges ACK Imaging
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)