Web Snippets: Overview of kafka

Tuesday, February 27, 2018

Overview of kafka

Apache Kafka is a distributed commit log service
Functions much like a publish/subscribe messaging system
Better throughput
Built-in partitioning, replication, and fault tolerance.
Increasingly popular for log collection and stream processing.

Note: It is often (but not exclusively) used in tandem with Apache Hadoop, Apache Storm, and Spark Streaming.

OBJECTS IN KAFKA

PUBLISHER/PRODUCERS

Creates some data and sends it to a specific location where an interested and authorized subscriber can retrieve the message and process it
Producers need to know the topic name and should have permission to send it to that location

SUBSCRIBER/CONSUMERS

Retrieves the message and process it
Retrieves messages based on the topic it is interested in.

NOTE:
Producers and consumers are simply applications that you write to use to implement the producing and consuming APIs.

TOPICS

Producers send the messages to a specific location refereed to as topic
Is a collection of grouping of messages
Have a specific name that can be defined upfront or on-demand, as long as producers know the topic

BROKER

Messages and topics need to be kept in a physical containers of data
The place where kafka keeps and maintains topics is called as the brokers
Is an executable or demon service that runs on a machine, a physical machine or a virtual machine
Handling messages in their topics, gives kafka its high throughput capabilities
We can scale out the no of brokers as much as needed to achieve the levels of throughput required and all of this without affecting existing producer and consuming applications

Worker nodes are kafka brokers

ex linkedin has 1,400 brokers => 2 petabytes per week

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)