Big Data Flashcards

1
Q

What are the 3Vs of big data and what do they mean?

A

Volume: Large amounts of data
Variety: In many different forms, from diverse sources
Velocity: The content is changing quickly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the ETL cycle?

A

Extract: Convert raw/semi-structured data into structured data
Transform: Convert units, join data structures, cleanup, etc.
Load: Load the data into another system for further processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between stream and batch processing?

A

Batch processing assumes all data exists in some store, and processes all data at once
Stream processing does not assume all data exists in some store, and processes data as it arrives to the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three basic data types?

A

Unstructured: Data with an unknown format
Semi-structured: Data with a known format
Structured: Data with a known format, linked in graphs/tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly