Apache Flume Flashcards

1
Q

What is Flume?

A

It is a distributed, reliable and available service for effectively collecting, aggregating, and moving large amounts of log data from various sources to centralized data store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Some flume features?

A

1) Ingestion from various sources to data stores like HDFS, HBase etc
2) Multi-hop flows, fan-in fan-out flows, contexual routing are supported by Flume
3) Horizontally scalable
4) Collection of data from multiple servers either in realtime or batch mode
5) Collect data from large set of sources and move them to multiple destinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Flume architecture

A

Flume architecture is flexible and is based on streaming data flow model
It is robust fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms.
There are flume agents which run on source machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Flume components

A

1) Client - any application which is generating logs and transmits the event to source operating within Flume agent
2) Source - Entity through which the event enters into Flume. It supports Avro, Thrift, Netcat, exec, syslog etc.
3) Channel
4) Sink

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Flume Interceptors?

A

Flume interceptors are customized business logic that can be injected between Source and Channel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Flume Agent? And how to configure it? How to start it?

A

Agent is the process that will consume the log data from your log file and send it to Kafka Broker.

We need to configure kafka-flume.conf file

bin/flume-ng agent –conf conf –conf-file flume.conf –name a1 -Dflume.root.logger=INFO, console

How well did you know this?
1
Not at all
2
3
4
5
Perfectly