Q & A Flashcards
What is a batch?
A collection of messages being produced to the same topic and partition.
What is a controller?
The “lead” broker within a cluster that manages the state of partitions, replicas, and partition reassignment.
How is a controller elected?
When a controller goes down, the available nodes within Zookeeper will determine the controller.
What is Zookeeper?
Zookeeper is a centralized service that keeps track of the brokers, configurations, topics, and partitions.
What is a leader?
Partitions are owned by a single broker in the cluster and that broker is the leader. The leader is the only one responsible for produce-consumer operations.
What are the three types of replicas?
Leader, follower, and preferred.
How are new leader replicas determined?
When the existing leader goes down (or becomes unresponsive), if auto.leader.rebalance.enable=true is set (default) it will check if the preferred leader is in-sync and select it as the leader. Otherwise, another in-sync replica will be chosen.
What is retention? What are the two types of retention?
Retention is a configurable time window that will determine how long messages are stored within a given topic. The two types of retention are delete and compact.
What is log compaction?
Compaction is a type of retention in which only the latest value of a given key is retained after the retention period has elasped.
Where is broker information stored in Zookeeper?
Under the /brokers/ids directory.
What is an ephemeral node?
When a broker starts up, an ephemeral node is created to represent it in Zookeeper. This node will stick around to allow brokers that go offline to immediately rejoin the cluster once back online.
What is an ensemble?
A cluster of Zookeeper nodes.
What is the preferred number of Zookeeper nodes?
An odd number, preferably something that adheres to 2N+1 as Zookeeper requires a quorum to make elections and respond to requests.
What is the default port(s) for Zookeeper?
2181 is the primary port, 2888 is used for elections, and 3888 is the leader port.
What is the default broker port?
9092
What is the default KSQL port?
8088
What is the default schema registry port?
8081
What is the auto.create.topics.enable setting? What actions can cause a new topic to be created?
When enabled, auto.create.topics.enable allows the broker to dynamically create topics if they don’t already exist. Any attempts to produce, consume, or request metadata from a topic will cause it to be created (using the default replication and partition settings from the broker).
What is the default number of partitions?
One partition is the default, although it is not preferred for scaling purposes.
How is a request handled in Kafka?
The process goes:
- Client Request
- Broker
- Partition Leader(s)
- Response
- Client
Are there any guarantees within Kafka with regards to ordering of messages?
Messages are always guaranteed to be ordered over a single partition.
What is a segment? What do they contain?
Partitions are divided into segments, which default to either 1GB of data or a week of messages. Each segment contains the messages (keys, values) over two indices (one related to offsets and another related to timestamps)
What is the unit of storage within Kafka?
A partition
What is stored within a message on disk?
The key, the value, a checksum for corruption, the encoding, format, timestamp.