Kafka Flashcards

1
Q

What is Kafka?

A

Kafka is a pub/sub messsaging system. It can also be used as an event streaming platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 6 main Kafka components?
Draw a diagram

A
  1. Message: key-value pair and additional metadata
  2. Producer: client that produces messages to a topic
  3. Topic: category for messages
  4. Partition: commit log
  5. Broker: server
  6. Consumer: client that consumes messages from a topic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 5 main components of a message?

A
  • Key: byte[]
  • Value: byte[]
  • Offset: int
  • Timestamp: long
  • Header: byte[]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Although Kafka does not require a data format for the content of its messages, why is it important to declare one?

A

Declaring a data format decouples messages from the producer and consumer. This can be done by defining and storing a schema in a shared repository. This way, the producer and consumer get to use the messages without direct coordiation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of a message key?

A

The purpose of a message key is to provide a way to send messages to a specific partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

By default, are messages sent to Kafka in batches or one at a time?

A

By default, messages are sent to kafka in batches. This can reduce network overhead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the flow of a message starting with the producer and ending with the consumer

A

The producer serializes the message and then uses a partitioner to decide which partition the message will be sent to. Under default settings, the partitioner will build up a batch of messages until they are sent to partitions in the Kafka cluster. The consumer continuously polls the partitions and returns a batch of messages. These messages are deserialized and then processed by the consumer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are partitions important?

A

Partitions are important because they provide a way for replication and parallel processing. This is because partitions can be distributed and replicated across separate brokers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do partitions guarantee order at the partition or topic level?

A

Partitions guarantee order at the partition level not the topic level. If you need order at the topic level, use a single partition for that topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an offset?

A

An offset is an integer that points to the location of a message in a partition. The offset is normally generated by Kafka and consumers commit them after processing the messages returned by poll()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does a producer balance messages over all partitions of a topic evenly by default?

A

partitioner.class
Determines which partition to send a record to when records are produced. Available options are:

If not set, the default partitioning logic is used. This strategy send records to a partition until at least batch.size bytes is produced to the partition. It works with the strategy:
1) If no partition is specified but a key is present, choose a partition based on a hash of the key.

2) If no partition or key is present, choose the sticky partition that changes when at least batch.size bytes are produced to the partition.

org.apache.kafka.clients.producer.RoundRobinPartitioner: A partitioning strategy where each record in a series of consecutive records is sent to a different partition, regardless of whether the ‘key’ is provided or not, until partitions run out and the process starts over again. Note: There’s a known issue that will cause uneven distribution when a new batch is created. See KAFKA-9965 for more detail.
Implementing the org.apache.kafka.clients.producer.Partitioner interface allows you to plug in a custom partitioner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can a partition be consumed by more than one instance of a consumer group?

A

No. While an instance of a consumer group can consume messages from multiple partitions, a partition can only be consumed by a single instance of a consumer group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the rule of thumb when deciding how many partitions to declare?

A

Declaring as many partitions as there are brokers in your cluster. This will evenly distribute the message load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the primary purpose of the Admin Client?

A

The primary purpose of the Admin Client is to configure and manage Kafka topics and brokers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is disk throughput?

A

Disk throughput is the average amount of data a storage device can read or write per unit of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The performance of producer clients is most directly influenced by …

A

The disk throughput of the broker being used. This is because most producer clients will wait until at least one broker has acknowledged that messages have been committed before considering the write successful. Faster disk writes will equal lower producer latency. SSDs are significantly faster than HDDs

17
Q

The performance of consumer clients is most directly influenced by …

A

The amount of memory available for the broker being used. This is because Kafka often caches messages in memory so that it doesn’t have to read from disk to provide the messages the consumer needs

18
Q

When does the producer serialize the ProducerRecord<K, V> into byte arrays?

A

Before sending them to Kafka

19
Q

What are the 3 mandatory configuration properties that need to be applied to a producer and consumer?

A
  1. bootstrap.servers
  2. key.serializer
  3. value.serializer

The last 2 are used to serialize the key and value to byte arrays and must be a class that implements org.apache.kafka.common.serialization.Serializer

20
Q

Should you focus on handling retriable errors or nonretriable errors?

A

Since the producer will handle retriable errors automatically, there is no point in handling retriable errors in your application. Focus on handling nonretriable errors instead.

21
Q

What is the default compression type for producers?

A

None. However, setting a compression type could reduce bandwidth usage.

22
Q

What are the 3 different producer acknowledgement properties?

A
  1. ack=0, producer does not wait for acknowledgement from a broker
  2. ack=1, producer waits for acknowledgement from a leader broker
  3. ack=all, producer waits for acknowledgement from all replica brokers
23
Q

Why is it beneficial to use an Avro schema registry for Avro records as opposed to embedding the entire schema inside every record?

A

Avro records require a schema to serialize and deseralize data. A schema registry is important because it allows Avro records to use a schema identifier instead of embedding the entire schema to prevent doubling in size. Producers use the registry to serialize the records before sending them to Kafka and consumers use the same registry to deserialize the records before consuming them

24
Q

Can Avro serialize POJOs?

A

No. Avro can only serialize Avro objects, which are generated from a schema using Avro code generation

25
Q

What’s the easiest way for creating Avro classes?

A

Avro Maven plug-in

26
Q

What happens if you use keys to send messages to specific partitions and the number of partitions is increased afterwards?

A

Mapping of keys to partitions is no longer guaranteed for new messages

27
Q

Review the following statement:

You can’t have multiple consumers that belong to the same group in one thread, and you can’t have multiple threads safely use the same consumer. One consumer per thread is the rule. To run multiple consumers in the same group in one application, you will need to run each in its own thread.

A
28
Q

What is an offset commit and what is its default strategy?

A

An offset commit is the action of a consumer updating its current position in a partition. The default strategy is set for the consumer to automatically commit offsets (e.g. enable.auto.commit=true). The acknowledgement is set at the broker level and is set to -1 which means a commit is only considered successful if all members of the replica set have also updated the offset. However, please keep in mind that Spring Kafka sets its own support for offset commits and acknowledgements. Spring sets enable.auto.commit=false

29
Q

By default, does a consumer joining a group cause a rebalance?

A

Yes.

30
Q

What is the difference between consumer.commitSync() and consumer.commitAsync()?

A

consumer.commitSync() will block until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown to the caller). In other words, it supports retries. consumer.commitAsync() will not block and any errors encountered are either passed to the callback (if provided) or discarded. consumer.commitAsync() does not support retries.

31
Q

Review the following code:

Duration timeout = Duration.ofMillis(100);

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(timeout);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("topic = %s, partition = %d, offset =
            %d, customer = %s, country = %s\n",
            record.topic(), record.partition(),
            record.offset(), record.key(), record.value()); 
    }
    try {
        consumer.commitSync(); 
    } catch (CommitFailedException e) {
        log.error("commit failed", e) 
    }
A