Tuesday, February 27, 2018

Message Consumption in Apache kafka


  • Critical concept to understand because it is how consumers can read messages at their own pace and process them independently. 
  • Place holder, It is like a bookmark that maintains the last read position
  • In the case of kafka topic, it is the last read message.
  • The offset is entirely established and maintained by the consumer.Since the consumer is entirely responsible for reading the messages and processing them on its own.
  • Keep track of what it has read and has not read
  • Offset refers to a message identifier


  1.  When a consumer wishes to read from a topic, it must establish a connection with a Broker
  2. After establishing the connection, the consumer will decide what messages it wants to consume
  3. If the consumer has not previously read from the topic , or it has to start over, it will issue a statement to read from the beginning of the topic (Consumer establishing that its message offset for the topic is 0)   
     0 1 2 3 4
       4.  As it reads through the sequence of messages, it will inevitably come to the last message in                  the topic and move it's offset accordingly
       5. If another consumer is interested in the message from the topic,it could have already read the               messages from the beginning and is simply waiting for more messages to arrive so it can read             and process them.
                 Note: It knows where it left off and can choose to advance from the position, stay put or                                 go back and reread another previously read message, all without the producer,                                       brokers or other consumers needing to know or care


  •   When new messages arrive, the connected consumer will receive an event indicating there is a new message and it can advance its position one and it retrieves the new message


  •  When the last message in the topic is read and processed, the consumer can set its offset, and at that point is caught up.

  •  The time it can retain messages is configurable and is known as the message retention policy
  • All messages are retained by a Kafka cluster regardless if a single consumer has consumed a message. 
  • The length of time in which messages are retained is configurable in hours.Default retention period is 168 hours or 7 days.Beyond that message would start to fall off

Note: Retention period is set for a per topic basis, which means that within a cluster, we could have hundreds of retention policy
Ability to retain message is corresponded to the available storage.

1 comment:

  1. https://kafka.apache.org/documentation.html#kafka_mq - Check the last para.

    I believe you are missing 3 more partitions in your picture. In your case only one consumer will be used.