Overview
Designing the architecture of Twitter and similar social networks is a popular engineering interview question asked at companies like LinkedIn, Microsoft, Google, Snapchat, NVidia and others.No solution is right or wrong and might have trade-offs.Let's try our best to take different features and design a high level solution to the problem.
Approach
When we design a system. we first need to list the features what we need.Lets build the system feature by feature and later test for performance and improve on it.
Let us consider 3 main features of twitter system
1.TweetingThis would have all the tweets.
2.Timeline
-User
List of all the tweets within that timeline.Content of this would be less as this would be the list of
tweets that the user has done at a particular timeline
-Home
List of all the tweets of the users whom we are following. Each users might have thousands of tweets at a particular timeline.
3.Following
These are the list of the users that we are following. This would not dynamically change and would have less load on the system
Initial design
Lets design the initial tables for the system.This table would have the id, column and primary key of the user.
User:
This table would have all the twitter user
Challenges
We would implement this using the traditional rdbms system.This system would run smoothly, however as the number of users and tweets increases, it would put load on the system and may crash.
Redesign
Lets load all the tweets of all the followers of the particular user in memory( ram). this can be done by using a redis cluster. In other words the user home page is loaded in memory for the user even when the user has not logged to the system. When the user logs to the system the tweets are flushed from memory.
Whenever the user clicks on send button data is fed to a load balancer. The load balancer would look for the closest datacenter and load the contents in the redis cluster. There would 3 copies of single tweets for each user. If there are 100 tweets then there would be 300 tweets stored in the cluster per user.
Challenges
If the user tweets who has million followers then there would be 3 million copies of this stored in memory. This would slow the system.
We need good amount of ram to satisfy this design.
Hash lookup
User specific content from the cluster is fetched using a hash look up
Redesign
In this design we would be maintaining the same redis cluster however there would be a conditional check for the type of user.
If the user has more followers. ex million followers then it would be stored in the rdbms system. All other tweets are stored in memory.When the user comes to the home page who has the famous follower in his list, then it would get the tweets from the rdbms and rest from the redis cluster
Note: Following and tweeting would use the rdbms system for querying
No comments:
Post a Comment