Kafka Consumer Flashcards

1
Q

What is a consumer?

A

A consumer is any application which consumes data from Kafka topic by subscribing to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can Kafka consumers subscribe to multiple topics?

A

Yes a single Kafka consumer can subscribe to multiple topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we need Consumer groups?

A

Suppose producer is producing too much data and single consumer is not able to keep up with the demand to process all data, we spin up multiple consumers and keep them in same logical group called Consumer Group. Consumer group share the same offset id. So once a record has been read by one consumer, other cosumer in the same consumer group will not get that message. This allows sharing of load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Do consumers in Consumer Group share offset?

A

Yes, consumers in same consumer group share offset id.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many consumers can consume from a partition?

A

Only one consumer can consume from one partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the relation between number of partitions and number of consumers?

A

The number of consumers can be either equal to or less than number of partitions. So only one consumer will be updating the offset metadata at a time. One consumer may be handling either one or more than one partitions at a time. But two consumers cannot handle single partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

At which level is the metadata about the offset maintained? Consumer or Consumer Group level?

A

The metadata about the offset is maintained at the Consumer group level, so if same message is to be consumed by multiple consumers then they need to be part of different consumer groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which property defines the consumer group that the consumer will belong to?

A

“group.id” property defines the group that the consumer will join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What will happen if you only have few consumers consuming from multiple partitions?

A

There will be lag in processing of messages, as the consumers cannot keep up with the rate at which messages are being published.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is it possible to give partition number to consumer when starting?

A

Yes it is possible to give consumer partition number with topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What will happen if you have a topic with one partition only and the consumer group has three consumers with same consumer group name?

A

In that case, only one consumer will be able to consume the data, other consumers will not get data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the best practice for number of consumers in consumer group supposing that the topic has n partitions?

A

It is a best practice to have n + 1 or n + 2 consumers as they will act as failovers when any consumer goes down.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is rebalancing in Kafka and why is it required?

A

Moving ownership of a parition from one consumer to another is called rebalance.
Whenever a new consumer joins the group or a consumer goes dead, new partition is added, then Kafka will try to balance the load among all the consumers. This process of redistributing the load is called rebalancing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When can rebalancing occur?

A

1) A new consumer is added
2) Consumer crashes or is DEAD
3) Consumer is not sending heartbeat response due to doing some heavy processing, it is deemed as logically DEAD
4) A new partition is added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What affect does existing consumers have when rebalance occurs?

A

Consumers loose their current state, ie. which partition is currently assigned to them and such. During rebalance, no messages will be processed from the partitions that owned by the dead consumer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does heartbeat work in Kafka, who monitors the heartbeat?

A

Consumers connect to group co-ordinators and send heart beat on some regular intervals. If consumer stops sending heartbeats for long enough, its session will timeout and the group co-ordinator will consider it dead and trigger a rebalance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relation between Group Co-ordinator and topic?

A

For every topic a Group co-ordinator is assigned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is consumer leader?

A

The first consumer that joins the consumer group becomes the consumer leader for that group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a JoinGroup request?

A

When a consumer wants to join a group, it sends a JoinGroup request to the group co-ordinator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the responsibilities of consumer leader?

A

1) The consumer leader receives list of all consumers in the consumer group from group co-ordinator.
2) Leader is responsible for assigning a subset of partitions to each consumer.
3) After deciding the partition assignment the leader will send all this information to group co-ordinator
4) Group co-ordinator sends this information to all the consumers.
This process is repeated every time rebalance is triggered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is use of interface PartitionAssignor?

A

Consumer leader uses implementation of PartitionAssignor interface to decide which partitions should be handled by which consumer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which consumer has all the information about parition assignment regarding consumer group?

A

The consumer leader is the only client process that has full list of consumers in the group and its assignments. All the other consumers in the group only have knowledge about their assignments.

23
Q

What happens when we call poll() method for the first time in consumer code?

A

First time we call poll with a new consumer, it is responsible for finding the GroupCoordinator, joining the consumer group and receiving the partition assignment.
If rebalance is triggered it will also be handled in poll method.

24
Q

Which things are handled in poll method?

A

1) Joining consumer group
2) Contacting Group coordinator
3) Getting the assigned paritions
4) Rebalancing

So it handles co-ordination, parition rebalances, data fetching.

25
Q

Which method is used to fetch data from Kafka in Consumer?

A

poll() method is used.

26
Q

What does poll() method return?

A

poll() method returns ConsumerRecords which contains multiple records. Then we can iterate through the ConsumerRecord using foreach loop as ConsumerRecords implements Iterable interface.

27
Q

What does ConsumerRecord contain?

A

It contains offset, partition, topic, key, value.

28
Q

What is the use fetch.min.bytes configuration property?

A

When you call poll() method, it specifies how many min bytes needs to be collected before the poll method returns.

29
Q

What is the use of fetch.max.wait.ms configuration property?

A

The maximum amount to time the server will block before answering the fetch request if there isn’t sufficient data to immediately satisfy the requirement given by fetch.min.bytes.

30
Q

What is max.partition.fetch.bytes configuration property?

A

It controls the maximum number of bytes the server will return per partition. The default is 1 MB.

31
Q

What is auto.offset.reset configuration property?

A

This property controls the behavior of the consumer when it starts reading a partition for whih it doesn’t have a committed offset or if the committed offset it has is invalid.
Two possible values are earliest, latest.

32
Q

What is enable.auto.commit configuration property?

A

It controls whether the consumer will commit offsets automatically. It defaults to true.

33
Q

What is session.timeout.ms configuration property?

A

The amount of time the consumer can be out of contact with brokers while still considered alive defaults to 3 seconds.

34
Q

What is max.poll.records configuration property?

A

The maximum number of records that a single call to poll() will return. This is useful to help control the amount of data your application will need to process in the polling loop.

35
Q

What is client.id configuration property?

A

This can be any string and will be used by the brokers to identify messages sent from the client. It is sued in logging and metrics and for quotas.

36
Q

How does Kafka know the state of the consumers, as to which consumer has read how many messages from which partition, basically the offset it?

A

Consumers need to update Kafka about the progress. That is know Kafka will come to know about the status. This action of updating the current position in the partition is called commit.

37
Q

How is Kafka different from JMS in regards to offset?

A

In JMS the messaging system keeps track of received messages. But in case of Kafka consumer keeps track of received messages and just updates Kafka regarding their status.

38
Q

In how many different ways can Consumer commit?

A

1) commitSync - Till the time commit is not done, your processing logic halts - Commit current offset
2) commitAsync - Commit happens in different thread
3) Automatic commit - if we enable enable.auto.commit then every 5 seconds the consumer will commit the largest offset your client received from poll()
4) Combining sync and async commits
5) Commit specified offset

39
Q

Which problems can occur if offset is not committed properly?

A

Either the consumer will lose some messages, or the consumer will reprocess some messages.

40
Q

What does auto.commit.interval.ms configuration property do?

A

It specifies the amount of time after which commit will be done. It is driven by poll method. Everytime consumer consumes using poll method, it checks to see if it is time to commit, if so it will commit the offsets it returned in the last poll.
close() methods commits offsets automatically.

41
Q

What are drawbacks of commitSync or commit current offset technique?

A

1) Application is blocked until the broker responds
2) This limits the throughput of the application.
3) Throughput can be improved by less frequent commits but it increases the chances of potential duplicates.

42
Q

How to get feedback about async commit whether it succeeded or not?

A

In consumer.commitAsync(OffsetCommitCallback), we can provide implementation of OffsetCommitCallback. It has onComplete(Map offsets, Exception ex) where you get exception non null if there was any issue.

43
Q

What solutions do we have when async offset commit fails?

A

We can either retry to commit the same offset or commit the latest offset.

44
Q

Which is the best practice to handle shutdown when using commitAsync?

A

The best practice is to combine async and sync. Before the consumer shuts down, the sync will commit the latest offset before exiting.

45
Q

Is it possible to commit specified offset?

A

Yes using commitAsync we can also provide Map into the commit method.

46
Q

What is the problem that can occur with commits when partition rebalance takes place due to a new consumer etc.

A

When partition rebalance takes place the consumers pause for a while before the partitions are distributed among them. So we need to commit offset before that. How do we know when rebalancing is about to take place. We can impement listener called ConsumerRebalanceListener.

47
Q

Which methods are there in ConsumerRebalanceListener?

A

onPartitionsRevoked() - Called when a new consumer is added and before rebalancing starts and current consumer has stopped consuming messages.

onPartitionAssigned() - Called after the partitions have been reassigned to the broker but before the consumer starts consuming messages.

48
Q

Sample code to provide rebalance listener to consumer?

A

consumer.subscribe(topicList, new RebalanceListener());

49
Q

By default poll method starts consuming messages from where start or last?

A

poll() method starts consuming messages from the last committed offset in each partitions and process all messages in sequence.

50
Q

In how many ways can we consumer records with specific offset?

A

1) Skip to beginning - It will start reading messages from the beggining of partition.
consumer. seekToBeginning(TopicPartition tp);
2) Skip to end - If we want to skip all the way to the end of the partition and consume only the new messages.
consumer. seekToEnd(TopicPartition tp);

51
Q

If we commit the offset after processing the message but still there is chance that the application will crash after the message was processed and stored in database (i.e. side effect is done) but we are not able to update Kafka. How to solve this issue?

A

1) If the problem is due to partition rebalance then we can implement ConsumerRebalanceListener and save the offset to db. and when partition is assigned back we can read the offsets from database.

52
Q

Which method is used to change consumer offset to a particular value from which it will start processing messages?

A

consumer.seek(topicPartition, offset) can be used to change the consumer offset to a particular value.

53
Q

How to safely break out of poll loop? Consumer is waiting for new messages but now you want to shutdown how to break from that safely?

A

We can do consumer.wakeup() in the ShutdownHook. ShutdownHook is run in another thread. So when we call consumer.wakeup() from shutdown hook, then we will get WakeupException from poll() method. This is signal that someone called wakeup on consumer.

54
Q

What happens if there is single partition in topic and multiple consumers try to connect to same topic and consume messages (Either as two separate processes or two different threads)

A

At a time only one consumer is able to read messages from partition. Once one consumer disconnects, other is allowed to read messages from partition. Ensuring ordered delivery of messages.