Big Data Systems Flashcards

(18 cards)

1
Q

What are big data systems?

A

Big data systems are designed to handle and process large and complex data sets, encompassing structured, semi-structured, and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 Vs of big data?

A

Volume, Velocity, Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Volume in terms of data?

A

Volume is the enormous amount of data that is available for collection and produced by a variety of sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is velocity?

A

The speed at which data is generated - today often real time and near real time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is variety?

A

How much is structured, unstructured, or semi-structured data are you processing?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is veracity?

A

The trustworthiness of your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is variability?

A

How often does the meaning of the collected data change, the collection method change?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is value?

A

What is the business value of the data you collect?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between structured, semi-structured and unstructured data?

A

Structured data is data in spreadsheets or relational databases. Unstructured data are things like text, images, audio, visual, and semi-structured are things like sensor data that cannot be organized in fixed data schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the components of big data architecture

A

Data sources
Data storage
Batch processing
Real-time message ingestion
Stream processing
Analytical data store
Analysis and reporting
Orchestration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are data sources?

A

One or more data source like - application data stores, static files produced by applications, and real time data sources like IoT devices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is data storage?

A

A distributed file store that can hold high volumes of files in various formats - a data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is batch processing?

A

Because data sets are so large - big data solutions must process data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is real-time message ingestion

A

A way to capture and store real time messages and often act as a buffer to support scale out processes sing and reliable delivery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is stream processing?

A

After real time messages are captured, stream processing filters, aggregates and prepares the data for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the analytical data store?

A

Processed data must be served in a structured format that can be queried using analytical tools. The data store serves these queries either as a relational data warehouse or a bakehouse with medallion architecture.

17
Q

What is analysis and reporting?

A

A way to empower the user to analyze the data - a data modeling or self service BI.

18
Q

What is orchestration?

A

A way to automate the workflows that transform source data, move data between sources and sinks, load the processed data or push the results to a dashboard