Big Data Flashcards

1
Q

What are the 5 V’s?

A
  • Volume - the sheer size of the data
  • Velocity - the speed at which data arrives and is processed
  • Variety - the number of different formats that data comes in
  • Variability - the variations in meaning, the context dependence
  • Veracity - the uncertainty of data quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is empirical science?

A

Science paradigm used thousands of years ago, which involves describing natural phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is theoretical branch science?

A

Science paradigm used in the last few hundred years, which involves using models and generalizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is computational branch science?

A

Science paradigm used in the last few decades, which involves simulating complex phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data exploration science (eScience)?

A

Science paradigm used today, which involves unifying a theory, experimenting and simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do big data frameworks deal with? (3)

A
  • Distribution of data
  • Mapping computation to distributed data
  • Handling resource failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Hadoop?

A

An ecosystem of big data technologies that are available as open source, originally developed by Yahoo.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the MapReduce algorithm?

A

Allows users to specify a map a function for distributed processing and a reduce function for aggregation of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly