Tuesday, April 24, 2018

Mean, Median and Mode

The "average" number; found by adding all data points and dividing by the number of data points.

Sunday, April 22, 2018

Kafka partitions

  • Each topic has one or more partitions
  • The no of topics in kafka is dependent on the circumstances in which Apache Kafka is intended to be used.It can be configurable
  • A partition is the basis for which kafka can
    • Scale
    • Become fault-tolerant
    • Achieve higher level of throughput
  • Each partitions are maintained at at-least one or more brokers
 Note: Each partition must fit on an entire machine. If we have one partition for a large and growing topic, we would be limited by the one broker node's ability to capture and retain messages being published to that topic. We would also run into IO constraints

Overview of S3

  • Interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web
  • Is an object store, not a file system.
  • Highly scalable, reliable, fast, inexpensive data storage infrastructure
  • Uses eventually consistency model

Markdown Cheat Sheet (Jupyter Notebook)


# H1
## H2
### H3
#### H4
##### H5
###### H6

Alternatively, for H1 and H2, an underline-ish style:



Overview of Pig

  •  Apache Pig is a high-level platform for creating programs that run on Apache Hadoop.
  • The language for this platform is called Pig Latin. 
  • Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark
Local mode
  • In local mode, Pig runs in a single JVM and access the local file system. This mode is suitable only for small data sets but not for big data sets.
  • We can set this local mode execution type by using “X” or “exectype”  option. To run in local mode, set the option to local

Shell script

  • Interprets user command which are directly entered by the user or which are read from a file called shell script/program
  • Shell script are interpreted
  • Typical operations performed by shell scripts include file manipulation, program execution, and printing text
Command to identify the shell type which the operating system supports

Overview of Flume


  • Distributed data collection service
  • Gets streaming event data from different sources
  • Moves large amount of log data from many different sources to a centralized data store.

Note: We cannot use flume to get relational data

Span vs div (CSS)

div is a block element, span is inline.
This means that to use them semantically, divs should be used to wrap sections of a document, while spans should be used to wrap small portions of text, images, etc.
For example:
<div>This a large main division, with <span>a small bit</span> of spanned text!</div>

Display block vs inline vs inline-block

Sample html
<!DOCTYPE html>
.floating-box {
    display: inline-block;
    width: 150px;
    height: 75px;
    margin: 10px;
    border: 3px solid #73AD21;  

.after-box {
    border: 3px solid red; 

<h2>The New Way - using inline-block</h2>

<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>
<div class="floating-box">Floating box</div>

<div class="after-box">Another box, after the floating boxes...</div>


CSS for td in the same line (Angular 2)


<table id="table_id"><tr><td>testtesttesttest</td>

React Function vs Class Component

React has 2 Components

Function Component:
  • Simplest form of react component
  • It receives an object of properties and returns JSX (which looks like html)

Saturday, April 14, 2018

Simple demo using Kafka

Start ZooKeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

Start telnet
telnet localhost 2181

Create topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

List kafka Topics
bin/kafka-topics.sh --list --zookeeper localhost:2181

Install kafka

Steps for installing Kafka

  • You need to setup a Java virtual machine on your system before you can run Apache Kafka properly.
  • We can install OpenJDK Runtime Environment 1.8.0 using YUM:
sudo yum install java-1.8.0-openjdk.x86_64
  • Validate your installation with:
java -version