Distributed Transport and Streaming Flashcards

1
Q

What is Kafka

A

Kafka is a Distributed event based data streaming platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Sqoop and what is is used for

A

Sqoop is and ingestion application for importing/exporting structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is flume and what is its use case

A

Flume is and event-based ingestion service used for streaming data. It was originally designed for importing logs to HDFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four main components of Flume

A

Sources: Data origin
Sink: Data destination
Channel: Acts as a bridge between source and sink
Agent: Independent daemon process which receives and forwards events to sinks and/or other agents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the 4 main components of Kafka

A

Producers: Publishes data to a topic in Lafka
Topics: A particular stream of data with a replication factor and a partition size
Consumers: Subscribes to a topic in Kafka
Consumer group: Collection of consumers reading from the same topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Kafka ensure high availability

A

Kafka is built up as a cluster of multiple instances called brokers, that can distribute reads and writes. Data is also replicated across different brokers ensuring access to data if one broker fails. Replicates are divided into leaders and followers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is zookeeper

A

Zookeeper is a Distributed Coordination Service for Distributed Applications. E.g. it helps keep services as Kafka alive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ksqldb

A

Specialized database optimized for stream processing which exposes an SQL-Like interface to handle the data in kafka.
ksqldb runs as its own fault tolerant cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is kafka connect

A

Kafka Connect is a framework for connecting external systems to Kafka. Kafka connect can be used as a replacement for sqoop and flume.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the 6 main parts of Kafka Connect

A

Connectors: A connector connects systems to kafka. There are both sources and sinks
Tasks: The implementation of how data is copied to or from Kafka
Workers: A process that executes connectors and tasks
Transforms: Simple logic to alter each message produce or sent by a connector
Dead Letter Queue: How Connect handles Connector Errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly