Kafka Connect Flashcards

1
Q

What is use of Kafka Connect?

A

Connect will be used to pull data from the external store to Kafka or push data from Kafka to external store.
It provides scalable and reliable way to move data between Kafka and other datastores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How many types of kafka connect are there?

A

Two types: Source connect and Sink connect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Kafka connect at Source?

A

Connect which takes input from external store and pushes it to Kafka is called Kafka Connect at source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Kafka connect at Sink?

A

Connect which pulls data from Kafka and pushes to external store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Features of Kafka Connect

A

1) Distributed and standalone modes
2) Common framework for Kafka connectors
3) Distributed and scalable by default
4) REST interface
5) Streaming and batch integration
6) Automatic offset management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which are some of already available Kafka source?

A

1) JDBC Source
2) Syslog source
3) MongoDB source
4) Cassandra Source
etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which are some of already avaialable Kafka Sink?

A

1) HDFS Sink
2) HBase Sink
3) S3 Sink
4) Elastic Search Sink
5) Cassandra Sink

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Command to start Connect worker?

A

For distributed: sh connect-distributed.sh config/connect-distributed.properties

For standalone: sh connect-standalone.sh config/connect-standalone.properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which are mandatory Connect properties that need to be provided?

A

1) broker.list
2) group.id
3) key.converter
4) value.converter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which are the two ways to build data pipeline?

A

1) ETL - Extract Transform Load
Data pipeline is responsible for making modifications to the data as it flows through the pipeline
Saves time and storage because we don’t need to store the data modify it and store it again
But sometimes shifts the burden of computation and storage to the data pipeline itself.

2) ELT - Extract Load Transform
Data pipeline does only minimal transformation (also called high fidelity pipelines or data lake architecture)

The system provides maximum flexibility to users since they have access to all the data

Drawback is transformation take CPU and storage resources at the target system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly