Tuesday, February 27, 2018

Overview of kafka

  • Apache Kafka is a distributed commit log service
  • Functions much like a publish/subscribe messaging system
  • Better throughput
  • Built-in partitioning, replication, and fault tolerance. 
  • Increasingly popular for log collection and stream processing.

Note: It is often (but not exclusively) used in tandem with Apache Hadoop, Apache Storm, and Spark Streaming.



  • Creates some data and sends it to a specific location where an interested and authorized subscriber can retrieve the message and process it
  • Producers need to know the topic name and should have permission to send it to that location


  • Retrieves the message and process it
  • Retrieves messages based on the topic it is interested in.

Producers and consumers are simply applications that you write to use to implement the producing and consuming APIs.


  • Producers send the messages to a specific location refereed to as topic
  • Is a collection of grouping of messages
  • Have a specific name that can be defined upfront or on-demand, as long as producers know the topic


  • Messages and topics need to be kept in a physical containers of data
  • The place where kafka keeps and maintains topics is called as the brokers
  • Is an executable or demon service that runs on a machine, a physical machine or a virtual machine
  • Handling messages in their topics, gives kafka  its high throughput capabilities
  • We can scale out the no of brokers as much as needed to achieve the levels of throughput required and all of this without affecting existing producer and consuming applications
Worker nodes are kafka brokers
  • ex linkedin has 1,400 brokers => 2 petabytes per week

No comments:

Post a Comment