Basics Flashcards
What is Kafka?
Kafka is publish subscribe messaging system rethought as a commit log service.
What are the components that make kafka
- Kafka is Distributed
- Composed of a cluster of severs
- Messages broken up between the servers in the cluster
- Has the notion of Leaders and followers
- Partitioned
- Logs are partitioned to fit on a server
- The partitions enables scalability to arbitrary size by adding nodes.
- The partitions also allows for parallelism
- Replicated
- Amount of replication is configurable
- data exist on multiple servers
- for data on N server, N-1 can fail and data is still accessable
- kafka is a write ahead log
Common kafka use cases
- Collect Metrics
- Aggregating logs
- Stream Processing
- Very good messaging system
- Website activity tracking
- Event sourcing
What is a kafka topic
A name you wish to give a series of messages
What is a kafka broker
An instance of kafka running on a server. Each individual server is a cluster or in kuberneties this is probably a pod
What is a kafka cluster
Clusters is a collections of brokers
Describe Kafka partition
- Topics are split into partitions
- Partitions are strongly ordered & immutable
- Partitions can exist on different servers/brokers
- Partitions enable scalability
- Partitions are always sent to the same consumer
What is a consumer group
A consumer group is a collections of consumers/nodes. They receive every message from kafka cluster meaning if there were two groups they would both receive the same message that went into kafka.
How do partitions receive messages
Producers assign a message to a partition
Where does a partition send the data?
- A partition is always sent to the same consumer instance
- A consumer group consumes one topic
What are the characteristics of Offsets
- Messages are assigned an offset in the partition
- Consumers then track where they left off reading with the offset
What are Topics and what are their characteristics
- A topic is a feed name or category
- Messages are published to a topic
- Name you give where you are going to send messages
- logical naming for a group of messages
- data for a topic is written to al relevant partitions
How do Producers interact with Topics
- Producers publish data to a chosen topic
- Producers can also assign a partition within the topic
How to Consumers interact with Topics
Consumers belong to a groups and each message in a topic is sent to one group member
How are topics creates?
- Can be created at the CLI
- You can modify a topic
What are replicas?
- Replicas are backups of a partition
- Partitions are replicated across servers in a cluster
- Replication factor is configurable
Explain producers and their role
Producers job is to publish data to the cluster.
- can run in sync or async mode.
- They can decide on where the data is going to go.
- Assigning data to specific partitions.
- Can assign a message to a partition
- Producers can decide on the partitioning semantics
Explain the consumers role
Consumers read the data from the cluster
- Consumers can have different characteristics
- One is called high level and another is called simple level
- simple level requires more setup allowing for the user to have more control over what the consumer does.
Explain the role of the Brokers
Brokers are clustered data repositories
- Kafka cluster contains a collection of brokers
- Brokers contain partitions
- Broker structure can produce messages independently of consuming messages
Advantages:
- decouple processing from data producers
- buffer unprocessed messages
- broker structure is the reason for Kafka’s very high performance
Explain what a cluster is
A cluster is a collection of brokers.
A cluster must have atleast one broker.
What are the characteristics and functionality of high level consumers
- Allows you to consume a subset of partitions
- Abstracts away the details about offsets for the consumer