4.11 - Big Data Flashcards

1
Q

What is Big Data?

A

Catch-all term for data that doesn’t fit usual containers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three defining features of big data?

A

Volume, velocity and variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is volume?

A

Too much data to fit on conventional hard drive or server, so data must be stored over multiple servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is velocity?

A

Data is created and modified rapidly, so servers must respond to frequently changing data within a matter of milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is variety?

A

Data held on servers consists of many different types, from binary files to multimedia like photos and videos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the biggest problem with big data and why?

A

Its unstructured nature makes it difficult to analyse the data. Conventional databases are not suited to it because they require data to conform to a row and column structure. Furthermore, conventional databases do not scale well over multiple servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is useful information extracted from big data?

A

Machine learning techniques are used to discern patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the solution to processing data over multiple machines and why?

A

Functional programming, since functional programs are stateless made make use of immutable data structures. Furthermore, it supports higher-order functions.

These attributes make it easier to write correct, efficient, distributed code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the fact-based model for representing data?

A

Each individual piece of info is stored as a fact. Facts are immutable and can’t be overwritten. Each fact has time stamp stored with it, indicating date and time when info was recorded. Facts never deleted or overwritten, so multiple values can be held for same attribute.

Reduces risk of accidentally losing data due to human error, and does away with index for data and instead simply appends new data to dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly