Kafka Flashcards

Question 1

Q

What is Kafka?

Answer

A

Kafka is a pub/sub messsaging system. It can also be used as an event streaming platform.

Question 2

Q

What are the 6 main Kafka components?
Draw a diagram

Answer

A

Message: key-value pair and additional metadata
Producer: client that produces messages to a topic
Topic: category for messages
Partition: commit log
Broker: server
Consumer: client that consumes messages from a topic

Question 3

Q

What are the 5 main components of a message?

Answer

A

Key: byte[]
Value: byte[]
Offset: int
Timestamp: long
Header: byte[]

Question 4

Q

Although Kafka does not require a data format for the content of its messages, why is it important to declare one?

Answer

A

Declaring a data format decouples messages from the producer and consumer. This can be done by defining and storing a schema in a shared repository. This way, the producer and consumer get to use the messages without direct coordiation.

Question 5

Q

What is the purpose of a message key?

Answer

A

The purpose of a message key is to provide a way to send messages to a specific partition

Question 6

Q

By default, are messages sent to Kafka in batches or one at a time?

Answer

A

By default, messages are sent to kafka in batches. This can reduce network overhead.

Question 7

Q

Describe the flow of a message starting with the producer and ending with the consumer

Answer

A

The producer serializes the message and then uses a partitioner to decide which partition the message will be sent to. Under default settings, the partitioner will build up a batch of messages until they are sent to partitions in the Kafka cluster. The consumer continuously polls the partitions and returns a batch of messages. These messages are deserialized and then processed by the consumer.

Question 8

Q

Why are partitions important?

Answer

A

Partitions are important because they provide a way for replication and parallel processing. This is because partitions can be distributed and replicated across separate brokers

Question 9

Q

Do partitions guarantee order at the partition or topic level?

Answer

A

Partitions guarantee order at the partition level not the topic level. If you need order at the topic level, use a single partition for that topic

Question 10

Q

What is an offset?

Answer

A

An offset is an integer that points to the location of a message in a partition. The offset is normally generated by Kafka and consumers commit them after processing the messages returned by poll()

Question 11

Q

Does a producer balance messages over all partitions of a topic evenly by default?

Answer

A

partitioner.class
Determines which partition to send a record to when records are produced. Available options are:

If not set, the default partitioning logic is used. This strategy send records to a partition until at least batch.size bytes is produced to the partition. It works with the strategy:
1) If no partition is specified but a key is present, choose a partition based on a hash of the key.

2) If no partition or key is present, choose the sticky partition that changes when at least batch.size bytes are produced to the partition.

org.apache.kafka.clients.producer.RoundRobinPartitioner: A partitioning strategy where each record in a series of consecutive records is sent to a different partition, regardless of whether the ‘key’ is provided or not, until partitions run out and the process starts over again. Note: There’s a known issue that will cause uneven distribution when a new batch is created. See KAFKA-9965 for more detail.
Implementing the org.apache.kafka.clients.producer.Partitioner interface allows you to plug in a custom partitioner.

Question 12

Q

Can a partition be consumed by more than one instance of a consumer group?

Answer

A

No. While an instance of a consumer group can consume messages from multiple partitions, a partition can only be consumed by a single instance of a consumer group

Question 13

Q

What is the rule of thumb when deciding how many partitions to declare?

Answer

A

Declaring as many partitions as there are brokers in your cluster. This will evenly distribute the message load

Question 14

Q

What is the primary purpose of the Admin Client?

Answer

A

The primary purpose of the Admin Client is to configure and manage Kafka topics and brokers

Question 15

Q

What is disk throughput?

Answer

A

Disk throughput is the average amount of data a storage device can read or write per unit of time

Question 16

Q

The performance of producer clients is most directly influenced by …

Answer

A

The disk throughput of the broker being used. This is because most producer clients will wait until at least one broker has acknowledged that messages have been committed before considering the write successful. Faster disk writes will equal lower producer latency. SSDs are significantly faster than HDDs

Question 17

Q

The performance of consumer clients is most directly influenced by …

Answer

A

The amount of memory available for the broker being used. This is because Kafka often caches messages in memory so that it doesn’t have to read from disk to provide the messages the consumer needs

Question 18

Q

When does the producer serialize the ProducerRecord<K, V> into byte arrays?

Answer

A

Before sending them to Kafka

Question 19

Q

What are the 3 mandatory configuration properties that need to be applied to a producer and consumer?

Answer

A

bootstrap.servers
key.serializer
value.serializer

The last 2 are used to serialize the key and value to byte arrays and must be a class that implements org.apache.kafka.common.serialization.Serializer

Question 20

Q

Should you focus on handling retriable errors or nonretriable errors?

Answer

A

Since the producer will handle retriable errors automatically, there is no point in handling retriable errors in your application. Focus on handling nonretriable errors instead.

Question 21

Q

What is the default compression type for producers?

Answer

A

None. However, setting a compression type could reduce bandwidth usage.

Question 22

Q

What are the 3 different producer acknowledgement properties?

Answer

A

ack=0, producer does not wait for acknowledgement from a broker
ack=1, producer waits for acknowledgement from a leader broker
ack=all, producer waits for acknowledgement from all replica brokers

Question 23

Q

Why is it beneficial to use an Avro schema registry for Avro records as opposed to embedding the entire schema inside every record?

Answer

A

Avro records require a schema to serialize and deseralize data. A schema registry is important because it allows Avro records to use a schema identifier instead of embedding the entire schema to prevent doubling in size. Producers use the registry to serialize the records before sending them to Kafka and consumers use the same registry to deserialize the records before consuming them

Question 24

Q

Can Avro serialize POJOs?

Answer

A

No. Avro can only serialize Avro objects, which are generated from a schema using Avro code generation

Question 25

Q

What’s the easiest way for creating Avro classes?

Answer

A

Avro Maven plug-in

Question 26

Q

What happens if you use keys to send messages to specific partitions and the number of partitions is increased afterwards?

Answer

A

Mapping of keys to partitions is no longer guaranteed for new messages

Question 27

Q

Review the following statement:

You can’t have multiple consumers that belong to the same group in one thread, and you can’t have multiple threads safely use the same consumer. One consumer per thread is the rule. To run multiple consumers in the same group in one application, you will need to run each in its own thread.

Question 28

Q

What is an offset commit and what is its default strategy?

Answer

A

An offset commit is the action of a consumer updating its current position in a partition. The default strategy is set for the consumer to automatically commit offsets (e.g. enable.auto.commit=true). The acknowledgement is set at the broker level and is set to -1 which means a commit is only considered successful if all members of the replica set have also updated the offset. However, please keep in mind that Spring Kafka sets its own support for offset commits and acknowledgements. Spring sets enable.auto.commit=false

Question 29

Q

By default, does a consumer joining a group cause a rebalance?

Question 30

Q

What is the difference between consumer.commitSync() and consumer.commitAsync()?

Answer

A

consumer.commitSync() will block until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown to the caller). In other words, it supports retries. consumer.commitAsync() will not block and any errors encountered are either passed to the callback (if provided) or discarded. consumer.commitAsync() does not support retries.

Question 31

Q

Review the following code:

Duration timeout = Duration.ofMillis(100);

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(timeout);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("topic = %s, partition = %d, offset =
            %d, customer = %s, country = %s\n",
            record.topic(), record.partition(),
            record.offset(), record.key(), record.value()); 
    }
    try {
        consumer.commitSync(); 
    } catch (CommitFailedException e) {
        log.error("commit failed", e) 
    }

Brainscape's Knowledge GenomeTM

Kafka Flashcards

Brainscape's Knowledge Genome^TM