Friday, October 26, 2018

System design Twitter


Overview

Designing the architecture of Twitter and similar social networks is a popular engineering interview question asked at companies like LinkedIn, Microsoft, Google, Snapchat, NVidia and others.No solution is right or wrong and might have trade-offs.Let's try our best to take different features and design a high level solution to the problem.

Approach

When we design a  system. we first need to list the features what we need.
Lets build the system feature by feature and later test for performance and improve on it.

Let us consider 3 main features of twitter system

1.Tweeting
     This would have all the tweets.
2.Timeline
    -User
       List of all the tweets within that timeline.Content of this would be less as this would be the list of
       tweets that the user has done at a particular timeline
    -Home
       List of all the tweets of the users whom we are following. Each users might have thousands of             tweets at a particular timeline.
3.Following
      These are the list of the users that we are following. This would not dynamically change and                would have less load on the system



Initial design

Lets design the initial tables for the system.

Tweets:
    This table would have the id, column and primary key of the user.
User:
    This table would have all the twitter user

Challenges
We would implement this using the traditional rdbms system.This system would run smoothly, however as the number of  users and tweets increases, it would put load on the system and may crash.

Redesign
Lets load all the tweets of all the followers of the particular user in memory( ram). this can be done by using a redis cluster. In other words the user home page is loaded in memory for the user even when the user has not logged to the system. When the user logs to the system the tweets are flushed from memory.

Whenever the user clicks on send button data is fed to a load balancer. The load balancer would look for the closest datacenter and load the contents in the redis cluster. There would 3 copies of single tweets for each user. If there are 100 tweets then there would be 300 tweets stored in the cluster per user.


Challenges
If the user tweets who has million followers then there would be 3 million copies of this stored in memory. This would slow the system.
We need good amount of ram to satisfy this design.

Hash lookup
User specific content from the cluster is fetched using a hash look up

Redesign
In this design we would be maintaining the same redis cluster however there would be a conditional check for the type of user.
If the user has more followers. ex million followers then it would be stored in the rdbms system. All other tweets are stored in memory.When the user comes to the home page who has the famous follower in his list, then it would get the tweets from the rdbms and rest from the redis cluster
 
Note: Following and tweeting would use the rdbms system for querying

No comments:

Post a Comment