Kafka Internals Flashcards

1
Q

What does Kafka use to keep track of all the brokers in the cluster?

A

Kafka uses Zookeeper to keep track of all the brokers that are part of cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If there are two zookeeper then there are how many clusters?

A

Two. We have one cluster per zookeeper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

At which path of Zookeeper are brokers stored?

A

Brokers are stored in /brokers/ids path on Zookeeper.
It is a list of Ephemeral node. This is path where producers and consumers subscribe to receive notifications regarding broker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Process of choosing controller of the cluster?

A

1) Whenever a broker comes up it first registers itself to the Zookeeper to be part of cluster at /brokers/ids path.
2) Once done that, every broker tries to register itself as the controller of the cluster. If that broker is the first broker in the cluster then it will become the controller of the group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if one broker with an id is registered and a duplicate broker comes up with same id?

A

The other broker will try to register itself with Zookeeper with the same id, where it will fail. So there can only be one broker with particular id in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if an existing broker in the cluster goes down?

A

If the broker goes down then the Ephemeral node created in /brokers/ids is also removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens if an existing broker with id 0 goes down or crashes and other new broker comes up with same id 0?

A

Because the old broker with that id is not running, then the new broker will take that id and will register itself with cluster successfully.
All the partitions and topics which were held by broker 0, all those things will be given to new broker with id 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are responsibilities of controller of cluster?

A

Apart from usual broker responsibilities, the controllers is responsible for electing partition leaders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which ephemeral node path is used in Zookeeper to store the controller?

A

/controller node is used to store controller of the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How new controller of the cluster is chosen when the controller goes down?

A

All the brokers watch the /controller path and as soon as the controller goes down, all the brokers will get notification and they will try to become the controllers of the group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Where do all the read and the write requests go for a topic partition?

A

The leader of the partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ISR (In-Sync Replica) in Kafka?

A

All the replicas which have caught up with the Leader are called in-sync replicas. Even a follower with single message behind is out of sync and not considered as in-sync replica.
So when the leader of a broker (partition) goes down, any in sync replica can take it’s place because it has all the latest messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When can replicas go out of sync?

A

Suppose a follower crashes and then comes back up, or due to some reason the replica server is not performing well or cannot cope up with the speed of leader, it will fall back from time to time. When it falls back, it is called out of sync replica. Out of sync replicas cannot take place of a leader of partition as they dont have complete messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the responsibilities of leader?

A

Apart from serving reads and writes, the leader is responsible for knowing which of the follower replicas is in-sync or up to date with leader.
Followers attempt to stay up to date with leader by replicating all the messages from the leader as the messages arrive, but they can fail to stay in sync.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do all the read and write requests go through the leader?

A

To guarantee consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is replica.lag.time.max.ms configuration property?

A

The amount of time follower can be inactive or behind before it is considered to be out of sync.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Preferred leader?

A

1) Each partition has a preferred leader, the replica that was leader when the partition was first created.
2) It is preferred because when the partitions are created , the leaders are balanced between brokers.
3) As a result, we expect that when the preferred leader is indeed the leader for all partitions in the cluster. In that way the load will be distributed evenly between all brokers.
4) auto.leader.balance.enabled=true In case of failure of leader if the preferred leader replica is in-sync then it triggers leader election to make the preferred leader the current leader.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does auto.leader.balance.enabled configuration property do?

A

auto.leader.balance.enabled=true In case of failure of leader if the preferred leader replica is in-sync then it triggers leader election to make the preferred leader the current leader.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Who elects the preferred leader?

A

Kafka Controller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Request Format of Kafka Produce or Consume request

A
Request Header
    Request Type
    Correlation Id
    Request Version
    Client ID
Data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Who initiates connections in Kafka?

A

Clients always initiates and sends requests, and the broker processes the requests and responds to them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is correlation id in kafka request header?

A

It is a unique number that identifies the request and is also found in error logs, which helps during troubleshooting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is request version in Kafka request header?

A

It is provided to brokers for handling clients of different versions and responding correspondingly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is use of client id in request header?

A

Used to identify the client that sent the request. This information is useful for troubleshooting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Request processing flow in Kafka?

A

Kafka uses Acceptor-Connector pattern.
The network threads are responsible for taking requests from client connections and placing them in request queue. The queued requests are then picked up for processing by processor threads. If it is a fetch request then processor thread will take data out of disk and place the response in response queue. Which are then sent to the consumer. If it is a produce request, then the ack is sent back using the processor threads.

26
Q

Validations required for Kafka Produce request

A

1) ACL permission - Write or Read permission for the topic.
2) Ack - 0, 1 or all MUST be any one from these
3) If the ack is all, they we have enough in sync replicas.

27
Q

Does Kafka wait till the data is written to the disk?

A

No, kafka does not wait till the data is persisted to disk. It relies on replication for message durability.

28
Q

Where are in flight requests stored in Kafka. In flight meaning in case of ack = all, the requests are kept on hold till response is received from replicas.

A

If acks is set to all, the request will be stored in buffer called “purgatory” until the leader observes that the follower replicas have replicated the message, at which point the response is sent to the client.

29
Q

What are fetch requests?

A

Fetch requests are sent from consumer to broker to request sending data for a particular partition from a topic.
Clients also specify the limit to how much data can broker return for each partition.
This limit is mandatory, if not potentially broker can send huge amount of data due to which consumer can go out of memory.

30
Q

What can be said about messages, in-sync replicas and consumers? In terms of which messages can consumer consume?

A

The consumer can only consume messages that were replicated across all replicas. If your in-sync replicas have not received latest messages they will get an empty response.

31
Q

Lets say there are 10 partitions, replication factor of 3 and 6 brokers. How many replicas do we need to allocate? How many partitions will each broker get? How many leaders will there be?

A

10 * 3 = 30 replicas
30 / 6 = 5 partitions
10 partitions need 10 leaders

32
Q

What is retention in Kafka?

A

Kafka is not a database or persistent storage. It is a durable publish subscribe mechanism. So it does not keep the data forever, nor does it wait for all consumers to read the message before deleting it.
Kafka administrator configures the retention period for each topic
- Amount of time to store messages before deleting them
- Amount of data to store before older messages are purged.

33
Q

What is a segment in Kafka?

A

Finding messages that needs deletion in large file and deleting a portion of a file is both time consuming and error prone. Thus each partition is split in segments.
By default each segment is either 1 GB of data or week worth data whichever is smaller as the Kafka broker writes to a paritition. If a segment limit is reached, we close the file and start a new one.

34
Q

File format of how data is stored in broker?

A

Each segment is stored in single data file.
The format of the data on the disk is identical to the format of the messages that we send. Using same message format on disk is what allows Kafka to use zero copy optimization when sending message to consumers.

35
Q

Segment file record format?

A

Offset | Magic | Compression Code | Timestamp | Key Size | Key | Value Size | Value

36
Q

What is indexes in Kafka?

A

When consumer requests a message, it first goes to the index file with Topic, Partition and Offset number. It gets the segment number in which the data is located, directly goes to that part of the segment and returns data.

37
Q

Can segment be deleted partially?

A

No either the whole segment is deleted or none.

38
Q

What is compaction?

A

Setting the policy for compaction only makes sense for topics which produce events that contain both keys and values.

39
Q

What happens in compaction when retention is set to delete?

A

It deletes older events than retention time to compact and only stores the most recent value for each key in the topic.

40
Q

What happens if topic contains null keys in compaction?

A

The compaction fails.

41
Q

When does a segment become eligible for compaction?

A

Only when the segment becomes inactive messages are eligible for compaction. Kafka starts compacting when 50% of the topic contains dirty records. Dirty data means data having same key but different value.

42
Q

What does unclean.leader.election.enable configuration property do?

A

Suppose from the cluster a leader of a partition goes down, and the replicas are out of sync. Then if this property is enabled then out of sync replica will also be chosen as leader. Out of sync replicas that become leaders are called “unclean leaders”. Default value is false.

43
Q

What are “unclean leaders”?

A

Suppose a leader of a partition goes down and the replicas are out of sync, then if “unclean.leader.election.enable” property is set, any of the out of sync replica can also be chosen as leader. That out of sync leader is called unclean leader.

44
Q

What happens if leader of a partition goes down and there is no in sync replica and “unclean.leader.election.enable” property is false?

A

Then the partition will go offline until we bring the old leader back online. We will not be able to produce any records to that partition.

45
Q

What happens to offsets that are written to unclean leader when the old leader comes back alive?

A

Suppose we wrote 11-14 offsets on new leader (unclean) and now the old leader comes back alive, kafka will give it priority and try to make it leader. But if old leader already had offsets 11-14 from old time before it went down. Then the records of new unclean leader will be lost.
Some consumers may have consumed new messages from new leader (unclean) and some may have consumed same offset old messages from the old leader. So it is possible that two consumers have different set of 11-14 messages.

46
Q

What is use case of Unclean leader election?

A

TBA

47
Q

What is the problem with Producer retries?

A

The problem arises is of duplicates, lets say we are working on banking system and we send message add 10 $ to account. Message is written to Kafka and is processed by consumer but we don’t receive ack due to network issue. In that case we will retry the message again and send add 10$ to account again. This will cause account to be credited with 20$. So we don’t want that.

48
Q

What is the solution to duplicates in Producer retries?

A

To avoid duplicates we should make messages “Idempotent”. If we make messages idempotent then it ensures that even if same message is sent twice then it has no negative effect on correctness.
Like “Add 10$ to account is NOT idempotent” but “Account balance is 110$ is idempotent”.

49
Q

How to validate Kafka configuration (broker, etc) without application logic?

A
We have some utilities available prepackaged with Kafka that help in testing if configuration we have chosen helps us in meeting our requirement.
There is VerifiableProducer and VerifiableConsumer class in org.apache.kafka.tools package.
50
Q

What checks and validations does VerifiableConsumer provides?

A

1) It performs complementary check
2) It consumes events and prints out the events it consumed in order.
3) It also prints information related to commits and rebalances.

51
Q

What checks and validations does VerifiableProducer provides?

A

1) It produces a sequence of messages from 1 to number to provide.
2) You can configure it by setting right number of acks, retries and rate at which messages will be produced.

52
Q

Which are the tests that should be performed after configuring Kafka cluster?

A

1) Leader election - What happens if we kill the leader? How long does it take for the producer and consumer to start working again?
2) Controller election - How long does it take the syste to resume after a restart of the controller.
3) Rolling restart - Can I restart the brokers one by one without losing any messages.
4) Unclean leader election test - What happens when we kill all the replicas of the partition one by one (to make sure each goes out of sync) and then restart the broker that was out of sync? What needs to happen in order to resume the operations? Is the behavior acceptable?

53
Q

How to measure latency of Kafka cluster?

A

Consumer Receive Timestamp - Producer produce timestamp = Latency.

54
Q

What is throughput of Kafka?

A

How many events arrive within a specific amount of time.

55
Q

Which tuning parameters can be checked on producer side?

A

1) GC parameters
2) Batch size
3) Sync or Async (linger.ms) property
4) Compression

56
Q

What does linger.ms property control in Kafka producer?

A

Sets the maximum time to buffer data in aync mode. By default the producer does not wait, It sends the buffer any time data is available. Increase linger.ms for higher latency and higher throughput.

57
Q

Which tuning parameters can be checked on consumer side?

A

1) GC parameters

2) Fetch size - Maximum message size a consumer can read. Must be atleast as large as “message.max.bytes”.

58
Q

What does fetch.message.max.bytes configuration property control?

A

Maximum message size a consumer can read. Must be atleast as large as “message.max.bytes”.

59
Q

Relation of throughput and consumers and consumer groups?

A

Adding new consumer can increase the overall throughput. Adding consumer group does not have affect on performance.

60
Q

Which tuning parameters can be checked for broker side?

A

1) message.max.bytes - Max message size broker will accept.
2) log.segment.bytes - Size of the Kafka data file. Must be larger than any single message.
3) replica.fetch.max.bytes - Maximum message size a broker can replicate. Must be larger than message.max.bytes or a broker can accept messages it cannot replicate resulting in data loss.
4) num.replica.fetchers - The number of threads which will be replicating data from the leader to the follower. If we have threads available then we should have more number of replica fetchers to complete the replication in parallel.
5) num.io.threads - No of threads which put data on disk. Setting the I/O threads directly depends on how much disk you have in your cluster. These threads are used by server to execute the request. We should have atleast as many threads as we have disk.