Click here to Skip to main content
15,884,298 members
Articles / Kafka

What is Kafka?

Rate me:
Please Sign up or sign in to vote.
4.90/5 (12 votes)
29 Jun 2019CPOL4 min read 21.9K   16   11
This article explains what is Kafka and its architecture and implementation and its differences with Akka.

Kafka

Kafka is a distributed publish-subscribe messaging system. Kafka is fast, scalable, and durable. It keeps feeds of messages in topics. Producers write data to topics and consumers read from topics.

Kafka ecosystem needs to be covered by Zookeeper, so there is a necessity to download it, change its properties and finally set the environment. After running Zookeeper, Kafka should be downloaded, then a developer will be able to create broker, cluster, and topic with the aid of some instructions.

What Is the Messaging System?

One of the most challenging parts of data engineering is how to collect and transmit the high volume of data from different points to the distributed systems for processing and analyzing. The enormous data needs to be decoupled properly via message queuing because if one part of data fails to be conveyed, the other data can be transmitted and analyzed when the system is recovered. There are two kinds of message queuing which are both reliable and asynchronous for the mentioned purpose. Point to point and publisher-subscriber.

Point to Point

In the point to point or one to one, there is one sender and multiple consumers who are listening to the sender. When one consumer receives a message from the queue, that specific message will disappear from the queue and other consumers cannot get it.

Publish and Subscribe System

While in the publisher-subscriber, the publisher sends a message to multiple consumers or subscribers who are listening to the publisher at the same time, and each subscriber can get the same message. Data should be transmitted through the data pipeline which is responsible to consolidate data from the sources.

What Is Kafka Architecture?

Kafka is distributed publisher-subscriber with the high throughput which can handle a high volume of data. Kafka is real-time data streaming and can process 2 million writes per second.

Image 1

Kafka Architecture is as follows:

Topics and Publisher

There is a publisher which sends the message. Messages are categorized according to topics, there are one or more partitions for each topic with its own offset address. For example, if we assign the replication factor = 2 for one topic, so Kafka will create two identical replicas for each partition and locate it in the cluster.

Clusters and Brokers

Kafka cluster includes brokers — servers or nodes and each broker can be located in a different machine and allows subscribers to pick messages. Therefore, replications are such as back up for partition and it means that Kafka is persistent which helps to be fault tolerant.

Zookeeper

Kafka cluster does not keep metadata for its own ecosystem because it is stateless. Hence, Kafka has a dependency on the Zookeeper which keeps track of the metadata. Zookeeper should be started at first. Indeed, Zookeeper is an interface between brokers and consumers and its existence is necessary for fault tolerance. Kafka brokers are responsible for load balancing, assume there is one topic and multiple partitions for this topic, each partition has a leader which periodically confirms its offset from Zookeeper. Therefore, if one node or broker fails, Kafka can continue its operation from the last offset address that has been asked from Zookeeper, so Zookeeper has the vital role in Kafka recovery in the case of crashing scenario.

Kafka Triggering

  1. Download Kafka from this link: https://kafka.apache.org/downloads
  2. Kafka Configuration, go to the path:
    1. C:\kafka_2.11–0.10.2.0\config
    2. server.properties
    3. log.dirs=C:\kafka_2.11–0.10.2.0\kafka-logs
    4. listeners=PLAINTEXT://127.0.0.1:9092
    5. zookeeper.properties
    6. dataDir=C:\kafka_2.11–0.10.2.0\zookeeper
    7. consumer.properties
    8. zookeeper.connect=127.0.0.1:2181
  3. Download Zookeeper from this link: “zookeeper-3.4.13.tar.gzhttps://archive.apache.org/dist/zookeeper/zookeeper-3.4.13/
  4. Start Zookeeper:
    1. cmd prompt
    2. cd C:\zookeeper-3.4.13\bin
    3. zkserver
  5. Start Kafka:
    1. cmd prompt
    2. cd C:\kafka_2.11–0.10.2.0
    3. Command: “.\bin\windows\kafka-server-start.bat .\config\server.properties
  6. Create a Topic on Kafka:
    1. cmd prompt
    2. cd C:\kafka_2.11–0.10.2.0\bin\windows
    3. Command: “kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Topic-IoT
  7. Delete Topic on Kafka:
    1. cmd prompt
    2. cd C:\kafka_2.11–0.10.2.0\bin\windows
    3. Command: “./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic <TopicName>
  8. Create a Producer on Kafka:
    1. cmd prompt
    2. cd C:\kafka_2.11–0.10.2.0\bin\windows
    3. Command: “kafka-console-producer.bat --broker-list localhost:9092 --topic Topic-IoT
  9. Create Consumer on Kafka:
    1. cmd prompt
    2. cd C:\kafka_2.11–0.10.2.0\bin\windows
    3. Command: “kafka-console-consumer.bat --zookeeper localhost:2181 --topic Topic-IoT
  10. Now whatever you write on “Producer” will be appeared on “Consumer”.

Image 2

One Producer - Publisher and two Consumers or Subscribers

Image 3

Kafka Management, Create Cluster and Topic

You can also use this link https://github.com/yahoo/kafka-manager in order to use visual configuration. You just need to configure hosts in windows:

C:\Windows\System32\drivers\etc\hosts
Append this line:
127.0.0.1 kafkaserver

Image 4Image 5

Image 6

Image 7

Image 8

Kafka vs Akka

Image 9

History

  • 20th June, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Doctorandin Technische Universität Berlin
Iran (Islamic Republic of) Iran (Islamic Republic of)
I have been working with different technologies and data more than 10 years.
I`d like to challenge with complex problem, then make it easy for using everyone. This is the best joy.

ICT Master in Norway 2013
Doctorandin at Technische Universität Berlin in Data Scientist ( currently )
-------------------------------------------------------------
Diamond is nothing except the pieces of the coal which have continued their activities finally they have become Diamond.

http://www.repocomp.com/

Comments and Discussions

 
QuestionNice explanation Pin
chandraprakashkabra5-Jul-19 3:03
professionalchandraprakashkabra5-Jul-19 3:03 
AnswerRe: Nice explanation Pin
Mahsa Hassankashi7-Jul-19 7:07
Mahsa Hassankashi7-Jul-19 7:07 
QuestionGood intro to Kafka Pin
nchamberlain26-Jun-19 7:34
professionalnchamberlain26-Jun-19 7:34 
AnswerRe: Good intro to Kafka Pin
Mahsa Hassankashi29-Jun-19 8:43
Mahsa Hassankashi29-Jun-19 8:43 
GeneralMy vote of 5 Pin
Jan Heckman24-Jun-19 6:39
professionalJan Heckman24-Jun-19 6:39 
GeneralRe: My vote of 5 Pin
Mahsa Hassankashi25-Jun-19 3:38
Mahsa Hassankashi25-Jun-19 3:38 
Thank you Smile | :)
http://www.repocomp.com/


QuestionI'm a big fan of Kafka Pin
Sacha Barber21-Jun-19 4:15
Sacha Barber21-Jun-19 4:15 
AnswerRe: I'm a big fan of Kafka Pin
Mahsa Hassankashi23-Jun-19 8:37
Mahsa Hassankashi23-Jun-19 8:37 
GeneralRe: I'm a big fan of Kafka Pin
Sacha Barber23-Jun-19 23:57
Sacha Barber23-Jun-19 23:57 
GeneralMy vote of 5 Pin
Igor Ladnik20-Jun-19 17:22
professionalIgor Ladnik20-Jun-19 17:22 
GeneralRe: My vote of 5 Pin
Mahsa Hassankashi21-Jun-19 1:22
Mahsa Hassankashi21-Jun-19 1:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.