Friday, October 26, 2018

System design Twitter


When we design a  system. we first need to list the features what we need.
Lets build the system feature by feature and later test for performance and improve on it.

Let us consider 3 features main features of twitter system
1.Tweeting
    This would have all the tweets
2.Timeline
    -User
       List of all the tweets within that timeline.Content of this would be less as this would be the list of
       tweets that the user has done at a particular timeline
    -Home
       List of all the tweets of the users whom we are following. Each users might have thousands of             tweets at a particular timeline.
3.Following
      These are the list of the users that we are following. This would not dynamically change and                would have less load on the system

System design Uber



Uber architecture relies on supply and demand.

  • Demand is for the user and supply is provided by the car
  • It uses a flat map from google to generate unique cell across the world regions
  • Each cell would have a  unique id
  • It would draw a circle that would have one or more cells as shown below

Wednesday, July 25, 2018

File format using hive




SEQUENCE FILE
1
2
3
4
5
6
7
Sequencefile
======================
create external table flight_seq 
 (year smallint,month tinyint,dayofmonth tinyint,dayofweek tinyint,
  lateaircraftdelay smallint)
 stored as sequencefile
location '/user/raj_ops/rawdata/handson_train/airline_performance/flights_seq';


Hive partition

  • Partitioning improves the time taken to access data by restricting query to only a certain portion of the dataset.
  • Care has to be taken as to what will make the partition column.
  • Once partition has been created, you can alter some definitions of the partition different from other partitions.
  • There is no hard limit on the number of partitions that a hive table can contain.However we still need to be careful
  • Querying without the partition column would increase the amount of time the query will complete compared to a non-partitioned table.
  •  Prefer static partitioning to dynamic for day-to-day data ingestion
  •  Pre-empt small file scenarios