Cloud Guru Practice Questions Flashcards

1
Q

The default topic retention period for your cluster is 64000 ms, but you have one topic which needs a longer retention period of 120000 ms. Which technique should you use to set the longer retention period for that topic?

  • Add a configuration override to the topic, setting the retention period to 120000
  • Create a new topic with a retention period of 120000
  • Change the cluster default retention period to 120000
  • Locate the broker that is functioning as the leader for the topic and set its retention period to 120000.
A
  • Add a configuration override to the topic, setting the retention period to 120000

You can add a configuration override to change the retention period for the specific topic that needs it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of a Serde when using the Kafka Streams API in Java?

  • Combine data from multiple streams.
  • Determine the priority for stream processing operations that require a relatively large amount of resources.
  • Specify a serializer/deserializer to translate Kafka data to and from typed Java data.
  • Determine which topics to read from and write to.
A
  • Specify a serializer/deserializer to translate Kafka data to and from typed Java data.

Serde is short for “serializer/deserializer”. Serdes are used with Kafka Streams to convert Kafka data into typed Java data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A partition has no ISRs, but there are some replicas available. Assuming that unclean.leader.election.enable is set to false, what will happen?

  • Any messages sent by publisher will be lost.
  • The topic will not accept new messages and producers will have to wait.
  • Kafka will crash.
  • An out-of-sync replica will become the new leader.
A
  • The topic will not accept new messages and producers will have to wait.

Since unclean leader election is not enabled, the topic will not accept new messages until an ISR (in-sync replica) becomes available for leader election.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You have two streams, and you want to combine them into one stream that contains all of the records of the input streams as separate records. Which stateless transformation would you use?

  • Map
  • Merge
  • Join
  • Combine
A

Merge - it combines two streams into one new stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are using the Kafka Streams API for Java. You have a KStream called “stream”. Which of the following lines of code would ensure that the output is sent to a topic called “output-topic”?

  • KStream.setOutputTopic(stream, “output-topic”);
  • stream.output(“output-topic”);
  • stream.to(“output-topic”);
  • stream.send(“output-topic”);
A
  • stream.to(“output-topic”);

The “.to()” method sends the output to the specified topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following commands could you use to list all topics in a cluster?

  • ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —list
  • ./bin/list-topics.sh —bootstrap-server localhost:9092 —list
  • ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —describe —topic
  • ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —all
A
  • ./bin/kafka-topics.sh —bootstrap-server localhost:9092 —list

This command would list all topics in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
There are two topics with the following data:
————————|——————————————
Topic A (Names) :		Topic B (Emails):
0034353: J Doe		   0034353: jdoe@co.com
0017654: J Smith		
							        0023466: bsimpson@co.com
————————|———————————————
Select the join type which could be used to produce the following output:
“0034353: Jane Doe, jdoe@company.com”
  • Left Join
  • Outer Join
  • Special Join
  • Inner Join
A

Inner Join.

An inner join contains only records that are present in both source streams. Therefore, an inner join would be able to output Jane’s name and email address, and would not contain any record for John since he is missing from the email’s topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following scenarios would allow you to successfully join two streams reading from two input topics? (Choose two)

  • Topic 1 has 4 partitions, topic 2 has 3 partitions, and your stream app is using a GlobalKTable.
  • Topic 1 has 4 partitions, topic 2 has 8 partitions, and your stream app is using a KTable.
  • Topic 1 has 4 partitions, topic 2 has 4 partitions, and your stream app is using a KTable.
  • Topic 1 has 1 partition, topic 2 has 3 partitions, and your stream app is using a KTable.
A
  • Topic 1 has 4 partitions, topic 2 has 3 partitions, and your stream app is using a GlobalKTable.
    This scenario allows a join because even though the topics are not co-partitioned, a GlobalKTable is being used.
  • Topic 1 has 4 partitions, topic 2 has 4 partitions, and your stream app is using a KTable.
    This scenario allows a join because the topics are co-partitioned.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You have a consumer group with 6 consumers consuming from a topic with 5 partitions. What will happen to the extra consumer?

A

It will remain idle and not process messages.

If there are more consumers than partitions, any extra consumers will remain idle and only process messages if another consumer goes down.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What Kafka Streams transformation would you use to print the value of each record to the console without modifying the stream, assuming that you still want to output the data to an output topic?

A

Peek - allows you to do arbitrary operations like printing to the console and allows further processing like outputting to an output topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

You have a stream containing some records, and you do not need to output the stream to a topic or do any further processing. Which transformation would you use to print the value of each record to the console?

  • Stop
  • Foreach
  • Map
  • Peek
A

Foreach.

Foreach would allow you to print the values to the console. It’s the best scenario since it is a terminal operation that stops any further processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You are performing a join, and you only want to join two records if their time stamps are within five minutes of one another. Which windowing strategy should you use?

A

Sliding Time Windows - because they are used for joins and are tied to the timestamps of records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You have two topics, one with employee names and another with employee email addresses. Both topics use an employee ID number as the key, which is unique to each employee. What kind of transformation would you use to combine these two topics into one stream of records where the keys are the employee ID numbers and the values contain both the employee name and email address?

  • Merge
  • Combine
  • flatMap
  • Join
A

A Join would work in this scenario because there is a shared key between the two topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Under normal circumstances, how many leaders are there in a topic that has 3 partitions and a replication factor of 2?

A

3.

There is one leader per partition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You have a stream of records. Which type of window would you use to perform a count aggregation that counts the number of records for each key that appears during each hour of the day?

A

Tumbling Time Windows

Since you are counting records by each hour of the day, you should use Tumbling Time Windows to divide the records into non-overlapping, gapless buckets for each hour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kafka Broker configurations such as “background.threads” can be updated in such a way that will automatically roll out the change to the entire cluster, without requiring broker restarts. Which dynamic update mode applies to these configurations?

  • per-broker
  • read-only
  • Auto-updating
  • cluster-wide
A

Cluster-wide configurations can be updated dynamically across the whole cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Sarah has been asked to retrieve some data from a Kafka topic. She decides to use the Confluent REST Proxy. Assuming REST Proxy has not been used with this cluster before, what is the first thing she should do?

  • Subscribe the consumer to the topic.
  • Enable the topic to serve data vía REST Proxy.
  • Make a GET request to retrieve the records.
  • Create a consumer and consumer instance.
A
  • Create a consumer and consumer instance.

Before proceeding, she will need to create the consumer and consumer instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

You have been asked to build a Kafka producer in Java. Which class can you use to handle interactions between your code and the cluster?

  • KafkaProducer
  • KafkaConsumer
  • MockProducer
  • KafkaPublisher
A

KafkaProducer handles interactions with the Kafka cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Consider the following piece of code:

	Producer producer = new KafkaProducer<>(props);
	ProducerRecord record = new ProducerRecord<>(“output_topic”, key, value);
	producer.send(record, (RecordMetadata metadata, Exception e) -> {
		if (e != null) {
			System.out.println(“Error publishing message: “ + e.getMessage());
		} else {
			System.out.println(“Published message: key=“ + record.key() +
				“, value = “ + record.value() +
				“, topic=“ + metadata.topic() +
				“, partition=“ + metadata.partition() +
				‘, offset=“ + metadata.offset());
		}
});

In the context of this code, what will be printed to the console as a result of the expression “metadata.offset()” in the “System.out.println” statement?

A

The offset of the record after it is published to the kafka topic.

This statement is part of a callback that is called after the record is published, and “metadata.offset()” refers to the record’s offset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When using the “.poll()” method on a consumer, what will happen if you then execute“consumer.commitSync();”

A
  • consumer.commitSync() provides the ability to perform manual offset commits. So, it will commit the consumer’s offsets to the cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which of the follow statements about Schema Registry compatibility checking is true?

  • It will automatically determine which compatibility mode you need based on the changes you want to make.
  • Compatibility checking allows you to decide what aspects of your schema can be changed.
  • Compatibility checking merely warns you if you are making a change that is not allowed.
  • It will guarantee that there are no problems as you update schemas.
A
  • Compatibility checking allows you to decide what aspects of your schema can be changed.

Compatibility checking allows you to select a compatibility type to determine what can and cannot be changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does consumer.subscribe( … ) do?

A

Determines which topic(s) the consumer will read from. This is because consumer.subscribe() sets a list of topics from which the consumer will consume records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You have a cluster with 5 brokers, a topic with 3 partitions, and a replication factor of 2. How many replicas, total, exist for this topic?

A

6.

Every topic partition in Kafka is replicated n times, where n is the replication factor of the topic. There are 2 partitions, each with 3 replicas which total 6 replicas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which of the following data sets would be best modeled as a table? (Choose 2)

  • Records of transactions at an airport restaurant.
  • Real-time records that are created whenever a plane departs.
  • The current status of which passengers have checked in for a flight.
  • Passengers on a plane and which seat they are assigned to.
A
  • The current status of which passengers have checked in for a flight.
    Since this data represents a state that can be updated (i.e. when passengers check in), it would be best represented as a table.
  • Passengers on a plane and which seat they are assigned to.
    Since this data represents a state that can be updated (i.e. a new passenger buys a ticket, or a passenger changes their seat), it would be best represented as a table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What limits the number of records that can be processed in parallel with Kafka Streams?

  • The number of records in the log.
  • The number of instances of the streams application.
  • The number of partitions in the topic.
  • The topic replication factor.
A
  • The number of partitions in the topic.

Streams consume records in the same way a consumer does, so it assigns a thread to each partition and can process one record per partition at a time.

A single instance can process multiple records in parallel, so the number of instances is not the limit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Which of the following statements are true about KSQL?

  • KSQL can perform aggregations.
  • KSQL is ANSI-SQL compliant.
  • KSQL cannot utilize windowing.
  • KSQL SELECT queries run continuously.
A

KSQL can perform aggregations.
And

  • KSQL SELECT queries run continuously.
    They run until they’re terminated.
27
Q

You have a dynamic consumer group where consumers frequently join and leave the group. How can you ensure that the consumer coordinator receives consumer status updates more frequently so that it can quickly rebalance partition assignments among the consumers as needed?

  • Increase the value of heartbeat.frequency.ms
  • Decrease the value of heartbeat.interval.ms
  • Decrease the value of consumer.heartbeat.ms
  • Increase the value of consumer.status.update.ms
A
  • Decrease the value of heartbeat.interval.ms

heartbeat. interval.ms controls the time interval between consumer heartbeats sent to the consumer coordinator.

28
Q

How can you prevent “man-in-the-middle attacks” between your brokers and Kafka clients?

  • Create an ACL to restrict access to resources within the cluster
  • Enable ACL authorization
  • Set “allow.everyone.if.no.ACL.found” to false
  • Enable and use TLS
A
  • Enable and use TLSTLS will prevent man-in-the-middle attacks through the use of certificates.
29
Q

Which of the following KSQL statements will successfully perform an aggregation?

  • SELECT sum(clicks) FROM pageviews;
  • SELECT * FROM pageviews INNER JOIN users ON pageviews.userid = users.id;
  • SELECT clicks FROM pageviews;
  • SELECT sum(clicks) FROM pageviews GROUP BY ipaddress;
A
  • SELECT sum(clicks) FROM pageviews GROUP BY ipaddress;This query will perform a sum aggregation on the pageviews stream or table.
30
Q

You are building a unit test for a Kafka Streams application using TopologyTestDriver. However, TopologyTestDriver only accepts test data in the form of byte[ ]. How can you easily convert your test keys and values into a
byte[ ] format?

  • Use TestDriverConverter
  • Use StreamsTestProcessor
  • Use ConsumerRecordFactory
  • You do not need to convert the data. TopologyTestDriver does it for you.
A

Use ConsumerRecordFactory. It converts test data into the format accepted by TopologyTestDriver.

31
Q

You notice a spike in your producer metrics called “io-wait-time-ms-avg”. Which of the following could be the cause?

  • You don’t have enough replicas.
  • Consumers are not able to keep up with all the records created by producers.
  • Your producers are producing more data than the cluster can handle.
  • The Kafka cluster is down.
A
  • Your producers are producing more data than the cluster can handle.A high I/O wait time could be an indication that the cluster cannot handle the amount of data being produced.
32
Q

What class in the “kafka-streams-test-utils” library can help you verify that your output records contain the expected data?

  • The kafka-streams-test-utils library doesn’t contain such a class.
  • RecordValidator.
  • StreamValidator.
  • OutputVerifier.
A

OutputVerifier can be used to validate the content of output records when testing Stream applications.

33
Q

You notice that one of your consumer metrics, “records-lag-max”, is abnormally high. What does this metric mean?

  • Producer performance is slow.
  • Consumers are not keeping up with producers and are many messages behind.
  • The cluster is taking too long to commit records.
  • Consumer’s fetch requests have high latency.
A
  • Consumers are not keeping up with producers and are many messages behind.This metric measures how far consumers are behind producers in terms of numbers of records.
34
Q

What metric would you use to measure the average amount of network traffic per second in bytes generated by a Kafka producer as it sends messages to the cluster?

  • outgoing-byte-total
  • incoming-byte-rate
  • bytes-per-second
  • outgoing-byte-rate
A
  • outgoing-byte-rateThis metric measures average outgoing message traffic in bytes per second.
35
Q

You have a stream of records. Which type of Window would you use to perform a count aggregation that counts the number of records for each key that appears during each hour of the day?

  • Hopping Time Windows
  • Sliding Time Windows
  • Session Windows
  • Tumbling Time Windows
A
  • Tumbling Time Windows

Since you are counting records by each hour of the day, you should use Tumbling Time Windows to divide the records into non-overlapping, gapless buckets for each hour.

Note: Hopping Time Windows can have overlaps and/or gaps, and therefore would not be useful in counting the number of records for each hour of the day.

36
Q

You have a cluster with 4 brokers, and a consumer group of 8 consumers ready to consume from a topic. You also want at least some level of fault tolerance in case a broker goes down. Which of the following topic configurations should you use when creating the topic?

  • 4 partitions with a replication factor of 2.
  • 16 partitions with a replication factor of 1.
  • 8 partitions with a replication factor of 2.
  • 8 partitions with a replication factor of 5.
A
  • 8 partitions with a replication factor of 2.

With 8 partitions, you can take advantage of all 8 consumers. A replication factor of 2 ensures that you will have more than one copy of the data if a broker goes down.

37
Q

You have a stream where each record represents one family and contains data about each member of the family. Which transformation would you use to process the data for each family and convert it to one or more individual records representing each family member?

  • Multiply
  • Map
  • Aggregate
  • flatMap
A
  • flatMap

flatMap can be used to process and convert a record into any number of new records.

38
Q

You have a stream where each record represents an employee at a company. The key of each record is the employee’s job title, and the value is their full name. i.e. Senior Engineer: Jane Smith. Which of the following transformations could you use to produce one record for each job title, where the value is a combined list of all of the names of employees with that title? (Choose Two)

  • Count
  • Reduce
  • Combine
  • Aggregate
A
  • Reduce & Aggregate

You could use Reduce to combine the records with the same job title key into one new record. Your Reduce function could then combine all of the relevant names into a list.

You could use Aggregate to combine the records with the same job title key into one new record. Your aggregation code could then combine the names into a list.

39
Q

You have a stream of records, and you need to convert the values of those records to a different datatype. Which transformation should you use to replace these records with new records that have the new datatype?

  • Convert
  • Map
  • Join
  • flatMap
A
  • Map

Map allows you to transform each record into a new record, potentially with a different datatype.

40
Q

When using the Kafka Producer API for Java, which method of the Producer class would you use to publish a ProducerRecord to a topic?

  • execute
  • send
  • publish
  • close
A
  • send

You can use Producer.send to publish a ProducerRecord to a topic.

41
Q

You have two consumer groups, each with two consumers. How many consumers will process a message that is published to the topic?

  • 0
  • 4
  • 2
  • null
A
  • 2

One consumer from each consumer group will process the message.

42
Q

You have a stream containing records for each employee of a company, but you need to do some processing on only the records with a job title of Engineer. Which transformation would you use to remove the unnecessary records and obtain a stream of only the records with the Engineer job title?

  • Filter
  • Reduce
  • Remove
  • Delete
A
  • Filter

Filter can be used to filter out records that do not have the Engineer job title.

43
Q

Which MBean can be used to determine the number of messages that are being received by a broker over time?

  • kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
  • kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
  • kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
  • kafka.server:type=BrokerTopicMetrics,name=MessagesOutPerSec
A

kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

This MBean contains data about how many messages are received every second.

44
Q

You have a topic called pageviews which contains some records. What is the effect of setting the topic retention period to 0, waiting a few minutes, then setting it to 86400000?

  • This would potentially cause the topic log to become corrupted.
  • This would cause any consumer connected to the topic to disconnect.
  • This would delete any existing records in the topic without deleting the topic itself.
  • This would cause the topic to be automatically deleted.
A
  • This would delete any existing records in the topic without deleting the topic itself.

Setting the retention period to 0 temporarily would cause the existing records to be deleted.

45
Q

Which of the following statements is true about Kafka Architecture. (Choose two)

  • Producers must run on a broker server.
  • A Kafka cluster is made up of one or more servers called brokers.
  • Zookeeper handles cluster management for a Kafka cluster.
  • A Zookeeper instance must run on each broker.
A
  • A Kafka cluster is made up of one or more servers called brokers.
    Kafka servers are called brokers, and multiple brokers form a cluster.
  • Zookeeper handles cluster management for a Kafka cluster.
    Zookeeper provides cluster management services within a Kafka cluster.
46
Q

Which of the following data sets would be best modeled as a stream? (Choose two)

  • A feed of notifications that are created every time the score changes in a football game.
  • Records that are created every time one car passes another in a motor race.
  • The current score in a tennis match.
  • Each racer’s best time in the 100m sprint over the course of a tournament.
A
  • A feed of notifications that are created every time the score changes in a football game.
    Since these records represent real-time events not
    intended to be updated after the fact, the data would
    be best modeled as a stream.
  • Records that are created every time one car passes another in a motor race.
    These records represent real-time events (cars
    passing each other) and would be best modeled as a
    stream.
47
Q

Which of the following statements is true about the way in which Kafka replicates data partitions. (Choose two)

  • Replicas that have the same data as the leader are known as In-Sync Replicas.
  • If the leader goes down, the partition cannot accept new records until the leader comes back up.
  • Replicas can never become a leader.
  • Partitions cannot have more than one leader.
A
  • Replicas that have the same data as the leader are known as In-Sync Replicas.
    Replicas that have the same data as the leader are
    considered In-Sync Replicas, or ISRs.
  • Partitions cannot have more than one leader.
48
Q

Consider the following code snippet:

final Properties props = new Properties();

props. put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, “localhost:9092”);
props. put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props. put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
props. put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, “http://localhost:8081”);

KafkaProducer producer = new KafkaProducer(props);


What does the reference to KafkaAvroSerializer do in this context?

  • It will cause record keys to be serialized using a schema.
  • It will automatically create a schema for the data at runtime.
  • It will automatically convert the data to a specific JSON format.
  • It will cause record values to be serialized using a schema.
A
  • It will cause record values to be serialized using a schema.

KafkaAvroSerializer converts data into a format specified by an Avro Schema in preparation for sending it to the cluster.

49
Q

What will happen when the following code is executed?

Properties props = new Properties();

props. put(“bootstrap.servers”, “localhost:9092”);
props. put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
props. put(“value.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);

Producer producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>(“messages”, 0, 1, “Hello, World!”));

  • An error will occur.
  • The code will not compile.
  • The producer will publish a record to partition 0 of the messages topic.
  • The producer will publish a record to partition 1 of the messages topic.
A

An error will occur because the key serializer does not match the actual data type of the record’s key.

50
Q

You have multiple teams working on producers and consumers that produce and consume one another’s messages. However, it has become difficult to manage changes to the data formats of these messages in a way that does not break consumers created by other teams. What tool could help you manage the data formats of your messages more easily?

  • Kafka Connect
  • Confluent Schema Registry
  • Confluent REST Proxy
  • Confluent KSQL
A

Confluent Schema Registry

Schema Registry could help by providing consistent schemas to define your data formats across multiple producers and consumers.

51
Q

When making changes to your Avro Schemas, you want to ensure that the data format is compatible with producers and consumers that are working with the same topics. You have updated your schema, and you are preparing to deploy a new consumer that uses the new schema, but the new schema is not registered yet. All producers are still using the latest registered schema. Which compatibility type would ensure that your changes to the consumer’s schema don’t cause any issues?

  • FORWARD
  • NONE
  • FORWARD_TRANSITIVE
  • BACKWARD
A

BACKWARD

The BACKWARD compatibility type ensures that consumers using a newer schema can still deserialize data using the latest schema in the registry.

52
Q

Which of the following metrics measures the amount of raw data consumed by a consumer per second?

  • data-consumed-rate
  • bytes-consumed-rate
  • fetch-rate
  • records-consumed-rate
A

bytes-consumed-rate

This metric measures the amount of raw data consumed per second, in bytes.

53
Q

You are looking at your Producer metrics and you notice that your request-latency-avg has spiked while your batch sizes remain small. Which of the following could be the cause?

  • The producer is sending a very large message
  • Performance issues in the producer
  • Performance issues in the cluster
  • Performance issues in the consumer
A

Performance issues in the cluster

If the cluster is not able to quickly take in records, this could cause request-latency-avg to spike.

Note: It is not “The producer is sending a very large message” - while this could cause request-latency-avg to spike, it would also cause the batch size to be large.

54
Q

Which metric should you monitor on a Kafka Producer to determine how many acknowledgments it is receiving per second?

  • response-rate
  • io-wait-time-ns-avg
  • ack-rate
  • request-rate
A

response-rate - it measures the number of acknowledgments per second.

55
Q

What technology does Kafka use to expose metrics in Java clients, such as producers and consumers?

  • JMS
  • JMX
  • JConsole
  • Zookeeper
A
  • JMX

Kafka clients expose their metrics by using JMX technology.

56
Q

You have some remote Kafka clients communicating with your cluster. Recently, an attacker was able to steal some of your valuable data by sniffing network traffic between your clients and Kafka brokers. What can you do to prevent this from happening in the future?

  • Enable ACL authorization.
  • Create a certificate authority.
  • Enable TLS and ensure your clients are using a secure protocol.
  • Change the IP address of your brokers.
A
  • Enable TLS and ensure your clients are using a secure protocol.

Enabling TLS would allow communication between clients and brokers to be encrypted, preventing attackers from stealing data by sniffing network traffic.

57
Q

One of your consumer metrics, fetch-rate, suddenly falls to zero. What does this mean?

  • The consumer is not making fetch requests to the cluster.
  • There are no new records for the consumer to consume.
  • The consumer is failing to commit its offsets.
  • There are too many records for the consumer to process.
A
  • The consumer is not making fetch requests to the cluster.

The fetch-rate metric measures the number of fetch requests made to the cluster per second.

58
Q

In the context of a unit test, what class can help you simulate interactions between your consumer code and the cluster, such as consumer.poll( … )?

  • SimulatedConsumer
  • KafkaConsumer
  • ConsumerTestDriver
  • MockConsumer
A

MockConsumer simulates interactions with the cluster, allowing you to test your consumer code in isolation.

59
Q

How would you restrict what authenticated clients can do once they connect to your cluster?

  • Use client certificates
  • Use ACLs
  • Use TLS
  • Use a Certificate Authority
A
  • Use ACLs

ACLs allow you to control what users are able to do in your cluster.

60
Q

What metric would you use to measure the average amount of network traffic per second in bytes generated by a Kafka producer as it sends messages to the cluster?

  • outgoing-byte-total
  • incoming-byte-rate
  • bytes-per-second
  • outgoing-byte-rate
A
  • outgoing-byte-rate

This metric measures average outgoing message traffic in bytes per second.

61
Q

Which of the following metrics measures the number of records consumed per second by a consumer?

  • fetch-rate
  • bytes-consumed-rate
  • records-consumed-rate
  • record-rate
A
  • records-consumed-rate

This metric measures the number of records consumed per second.

62
Q

You have been asked to write a unit test for some producer code. The producer code instantiates a KafkaProducer object and calls producer.send( … ). What class can help you test your code in isolation?

  • KafkaProducer
  • TestProducer
  • TestDriver
  • MockProducer
A

MockProducer would allow you to simulate the interactions between your code and KafkaProducer.

63
Q

Assume that allow.everyone.if.no.acl.found is set to true, and that the pageviews topic currently has no ACLs associated with it. Which of the following would you need to do to ensure that kafkauser can read from the pageviews topic?

a) kafka-acls –authorizer-properties zookeeper.connect=localhost:2181 –add –allow-principal User:kafkauser –operation read –topic pageviews
b) kafka-acls –authorizer-properties zookeeper.connect=localhost:2181 –add –allow-principal User:kafkauser –operation all –topic pageviews
c) You do not need to do anything.
d) kafka-acls –authorizer-properties zookeeper.connect=localhost:2181 –add –allow-principal User:kafkauser –operation read –operation write –topic pageviews

A

You do not need to do anything.

Since allow.everyone.if.no.acl.found is set to true, the user can already read from the topic.

NOTE: The following incorrect answer and explanation…

kafka-acls –authorizer-properties zookeeper.connect=localhost:2181 –add –allow-principal User:kafkauser –operation all –topic pageviews

You do not need to add this ACL in order to provide access since the user can already read from the topic. Also, if the user were not already able to read from the topic, this ACL would provide too much access due to its use of all operations.