Tuesday, February 27, 2018

Message Consumption in Apache kafka



MESSAGE OFFSET

  • Critical concept to understand because it is how consumers can read messages at their own pace and process them independently. 
  • Place holder, It is like a bookmark that maintains the last read position
  • In the case of kafka topic, it is the last read message.
  • The offset is entirely established and maintained by the consumer.Since the consumer is entirely responsible for reading the messages and processing them on its own.
  • Keep track of what it has read and has not read
  • Offset refers to a message identifier

STEPS INVOLVED

  1.  When a consumer wishes to read from a topic, it must establish a connection with a Broker
  2. After establishing the connection, the consumer will decide what messages it wants to consume
  3. If the consumer has not previously read from the topic , or it has to start over, it will issue a statement to read from the beginning of the topic (Consumer establishing that its message offset for the topic is 0)   

Apache ZooKeeper


APACHE KAFKA DISTRIBUTED ARCHITECTURE

  • At the heart of Apache kafka we have a cluster, which consists of hundreds of independent Brokers.
  • Closely associated with the kafka cluster, we have a Zookeeper environment,which provides the Brokers within a cluster, the metadata it needs to operate at scale and reliability.As this metadata is constantly changing, connectivity and chatter between the cluster members and Zookeeper is required.


Team formation in Kafka


CONTROLLER ELECTION
  • Hierarchy starts with a controller/supervisor 
  • It is a worker node elected by its peers to officiate in the administrative capacity of a controller
  • The worker node selected as controller is the one that is been around the longest

RESPONSIBILITY OF CONTROLLER ELECTION
  • Maintain inventory of what workers are available to take on work.
  • Maintain a list of work items that has been committed to and assigned to workers
  • Maintain active status of the staff and their progress on assigned tasks

Overview of kafka



  • Apache Kafka is a distributed commit log service
  • Functions much like a publish/subscribe messaging system
  • Better throughput
  • Built-in partitioning, replication, and fault tolerance. 
  • Increasingly popular for log collection and stream processing.

Wednesday, February 21, 2018

Why Stream Storage?


Need for stream storage

  • Decouple producers & consumers
  • Persistent buffer
  • Collect multiple streams
  • Preserve client ordering
  • Parallel consumption
  • Streaming Map Reduce



Message and Stream Storage



Amazon SQS

  • Amazon Simple Queue Service (SQS) is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications.
  • Building applications from individual components that each perform a discrete function improves scalability and reliability, and is best practice design for modern applications.

Types of data store


After collecting the data we  need to store the data in the data store.There are different types of data store.

Types of data store 

In memory : Caches, data structure servers
Database    : SQL & NoSQL databases
Search        : Search engines
File Store    : File systems
Queue          : Message queues
Stream storage: pub/sub message queues