Showing posts with label Apache Kafka. Show all posts
Showing posts with label Apache Kafka. Show all posts

Sunday, April 22, 2018

Kafka partitions

  • Each topic has one or more partitions
  • The no of topics in kafka is dependent on the circumstances in which Apache Kafka is intended to be used.It can be configurable
  • A partition is the basis for which kafka can
    • Scale
    • Become fault-tolerant
    • Achieve higher level of throughput
  • Each partitions are maintained at at-least one or more brokers
 Note: Each partition must fit on an entire machine. If we have one partition for a large and growing topic, we would be limited by the one broker node's ability to capture and retain messages being published to that topic. We would also run into IO constraints

Saturday, April 14, 2018

Simple demo using Kafka



Start ZooKeeper
1
bin/zookeeper-server-start.sh config/zookeeper.properties

Start telnet
1
telnet localhost 2181

Create topic
1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

List kafka Topics
1
bin/kafka-topics.sh --list --zookeeper localhost:2181

Install kafka


Steps for installing Kafka

  • You need to setup a Java virtual machine on your system before you can run Apache Kafka properly.
  • We can install OpenJDK Runtime Environment 1.8.0 using YUM:
sudo yum install java-1.8.0-openjdk.x86_64
  • Validate your installation with:
java -version

Tuesday, February 27, 2018

Apache ZooKeeper


APACHE KAFKA DISTRIBUTED ARCHITECTURE

  • At the heart of Apache kafka we have a cluster, which consists of hundreds of independent Brokers.
  • Closely associated with the kafka cluster, we have a Zookeeper environment,which provides the Brokers within a cluster, the metadata it needs to operate at scale and reliability.As this metadata is constantly changing, connectivity and chatter between the cluster members and Zookeeper is required.


Team formation in Kafka


CONTROLLER ELECTION
  • Hierarchy starts with a controller/supervisor 
  • It is a worker node elected by its peers to officiate in the administrative capacity of a controller
  • The worker node selected as controller is the one that is been around the longest

RESPONSIBILITY OF CONTROLLER ELECTION
  • Maintain inventory of what workers are available to take on work.
  • Maintain a list of work items that has been committed to and assigned to workers
  • Maintain active status of the staff and their progress on assigned tasks

Overview of kafka



  • Apache Kafka is a distributed commit log service
  • Functions much like a publish/subscribe messaging system
  • Better throughput
  • Built-in partitioning, replication, and fault tolerance. 
  • Increasingly popular for log collection and stream processing.

Wednesday, February 21, 2018

Why Stream Storage?


Need for stream storage

  • Decouple producers & consumers
  • Persistent buffer
  • Collect multiple streams
  • Preserve client ordering
  • Parallel consumption
  • Streaming Map Reduce



Labels