Wednesday, February 21, 2018

Temperature of Big Data



What is data temperature?


  •  It’s classifying data from hot to cold based on how frequently it is accessed. 
  • Hot data is accessed most frequently and cold data is accessed infrequently. 
       Hot Data
    • Measurements in large-scale analytic environments consistently indicate that less than 20% of the data is accessed by more than 90% of the I/Os in an analytic environment. Such data belongs in memory so we can retrieve it very fast.
      Cold Data
  • The other 80% of the data, which is accessed less than 10% of the time, can be thought of as cold data. 
  • Putting cold data in memory does not make sense from an economic point of view, especially with large volumes of data. If we are talking about 100 gigabytes, then put it all in memory. But if we’re talking about 100 terabytes, it doesn’t make economic sense to put everything in memory


Optimize for Both Cost and Performance


  • The goal of good engineering is to optimize for both cost and performance. 
  • Hot data, data that’s accessed very frequently, like the latest sales numbers, should be in memory. While memory costs more per terabyte for storage than electromechanical disk drives, it is also fast and is the lower cost per I/O infrastructure. 
  • In contrast, data that’s relatively cold should be in the lower cost per terabyte storage provided by disk drives because the low cost per I/O does not matter so much for data that is accessed infrequently. Low cost is key for cold data so that you can store lots of it economically. 
  • This is a big part of the design philosophy for “data lakes” used to capture “all” data forever in a big data environment.

5 comments:

  1. Utilizing prescient examination, associations can find and adventure patterns present inside information to recognize openings and dangers. Data Analytics Course in Bangalore

    ReplyDelete
  2. very well explained .I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Simple Linear Regression
    Correlation vs covariance
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  3. ACK Imaging Remanufactured Ink and Toner Cartridges ACK Imaging

    ReplyDelete
  4. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. best institute for cloud computing in hyderabad

    ReplyDelete