Sunday, April 22, 2018

Kafka partitions

  • Each topic has one or more partitions
  • The no of topics in kafka is dependent on the circumstances in which Apache Kafka is intended to be used.It can be configurable
  • A partition is the basis for which kafka can
    • Scale
    • Become fault-tolerant
    • Achieve higher level of throughput
  • Each partitions are maintained at at-least one or more brokers
 Note: Each partition must fit on an entire machine. If we have one partition for a large and growing topic, we would be limited by the one broker node's ability to capture and retain messages being published to that topic. We would also run into IO constraints



 Partition management in Kafka
  • When the command to create 3 partitions is issued, it  is handled by a Zookeeper (Maintains meta data regarding the cluster)
  • ZooKeeper looks into the available brokers and decide which brokers will be made the responsible leaders for managing a single partition within a topic
  • When that assignment is made, each unique kafka broker will create a log for the newly assigned partition.
  • As partition assignments are broadcast, each individual broker maintains a subset of the metadata that Zookeeper does,particularly the mapping of what partitions are being managed by that brokers.This enables any individual broker to direct a producer client to the appropriate broker for producing message to a specific partition.
  • Status is sent by each broker to the Zookeeper
Producer Messages
1) When a producer is ready to send messages to a topic. it must have knowledge of at-least one broker in the cluster, so it can find the leaders of the topics partitions.
2) Each broker knows which partitions are owned by which leader. 
3) The metadata related to the topic is sent back to the producer so it can send messages to the individual brokers participating in managing the topic.

Consumer Inquires
1)When consuming message from the cluster, the consumer inquires of Zookeeper which brokers own which partitions and gets additional metadata that affects the consumers consumption behavior
2) Once the consumer knows the brokers, with the partitions that make up the topic, it will pull the messages from the brokers based on the message offset per partition. 

Partitioning tradeoffs
  • The more partitions the greater the Zookeeper overhead
    • With large partition numbers ensure proper ZK capacity
  • Message ordering can become complex
    • Single partition for global ordering
    • Consumer-handling for ordering
  • The more the partitions the longer the leader fail-over time






No comments:

Post a Comment