Web Snippets: Apache Kafka

Showing posts with label Apache Kafka. Show all posts

Sunday, April 22, 2018

Kafka partitions

Each topic has one or more partitions
The no of topics in kafka is dependent on the circumstances in which Apache Kafka is intended to be used.It can be configurable
A partition is the basis for which kafka can

Scale
Become fault-tolerant
Achieve higher level of throughput

Each partitions are maintained at at-least one or more brokers

Note: Each partition must fit on an entire machine. If we have one partition for a large and growing topic, we would be limited by the one broker node's ability to capture and retain messages being published to that topic. We would also run into IO constraints

Simple demo using Kafka

Start ZooKeeper

1	bin/zookeeper-server-start.sh config/zookeeper.properties

Start telnet

1	telnet localhost 2181

Create topic

1	bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

List kafka Topics

1	bin/kafka-topics.sh --list --zookeeper localhost:2181

Steps for installing Kafka

You need to setup a Java virtual machine on your system before you can run Apache Kafka properly.
We can install OpenJDK Runtime Environment 1.8.0 using YUM:

sudo yum install java-1.8.0-openjdk.x86_64

Validate your installation with:

java -version

Apache ZooKeeper

APACHE KAFKA DISTRIBUTED ARCHITECTURE

At the heart of Apache kafka we have a cluster, which consists of hundreds of independent Brokers.
Closely associated with the kafka cluster, we have a Zookeeper environment,which provides the Brokers within a cluster, the metadata it needs to operate at scale and reliability.As this metadata is constantly changing, connectivity and chatter between the cluster members and Zookeeper is required.

CONTROLLER ELECTION

Hierarchy starts with a controller/supervisor
It is a worker node elected by its peers to officiate in the administrative capacity of a controller
The worker node selected as controller is the one that is been around the longest

RESPONSIBILITY OF CONTROLLER ELECTION

Maintain inventory of what workers are available to take on work.
Maintain a list of work items that has been committed to and assigned to workers
Maintain active status of the staff and their progress on assigned tasks

Apache Kafka is a distributed commit log service
Functions much like a publish/subscribe messaging system
Better throughput
Built-in partitioning, replication, and fault tolerance.
Increasingly popular for log collection and stream processing.

Why Stream Storage?

Need for stream storage

Decouple producers & consumers
Persistent buffer
Collect multiple streams
Preserve client ordering
Parallel consumption
Streaming Map Reduce

Web Snippets

Labels

Sunday, April 22, 2018

Kafka partitions

Saturday, April 14, 2018

Simple demo using Kafka

Install kafka

Steps for installing Kafka

Tuesday, February 27, 2018

Apache ZooKeeper

Team formation in Kafka

Overview of kafka

Wednesday, February 21, 2018

Why Stream Storage?

Need for stream storage

Labels

Blog Archive