4.11.1 Big Data Flashcards

1
Q

Define big data.

A

A catch all term for data that won’t fit in the usual containers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do the three v’s do?

A

Describe big data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 v’s?

A

Volume.
Velocity.
Variety.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define volume.

A

Too much data for it all to fit on a conventional hard drive or server. Data has to be stored over multiple serves, each of which is composed over many hard drives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In terms of volume, why must data be stored over multiple servers?

A

As relational databases don’t scale well over multiple machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define velocity.

A

Data in the servers created and modified rapidly. Servers must respond to frequently changing data within a matter of milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define variety.

A

Data held on the servers consists of many different data types - from binary files to multimedia files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In terms of big data, what is the biggest problem?

A

Unstructured nature gives cause for difficulty when analysing the data. Conventional databases are not suited to store big data as it is required that it confirms to a column and row structure. Do not scale well over multiple servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What needs to happen when storing big data over multiple servers?

A

The processing associated with using the data must be amongst multiple machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is storing big data over multiple machines incredibly difficult with conventional programming paradigms?

A

As all machines would have to be synchronised so no data is overwritten or damaged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is functional programming used with big data?

A

Solves the problem of programming over multiple machines.
Stateless - no side effects.
Uses immutable data structures.
Supports higher order functions.
Attributes make it easier to write and correct efficient, distributed code than with any procedural programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we represent data that doesn’t conform to the typical column and row format?

A

With the fact based model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In terms of the fact based model (FBM) how is data stored?

A

As a fact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(FBM) What are the benefits of facts?

A

Immutable and cannot be overwritten, reducing the risk of loosing data due to human error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(FBM) what is stored with each fact?

A

A time stamp - indicating the data and time each piece of information was recorded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are timestamps used?

A

Multiple different values could be held for the same attribute - computers can discern most reason values.

17
Q

Define a graph scheme (Big Data and Graphs (BDG)).

A

Uses nodes and edges to graphically represent the structure of a dataset.

18
Q

Define an edge.

A

Relationships between entities with a brief description of it.

19
Q

Where are the properties? (BDG)

A

Listed within the entities.

20
Q

How often are timestamps used and why.

A

Rarely, as it is assumed that most nodes contain the most recent information available.

21
Q

What are the alternative representations of properties? (BDG)

A

Inside rectangles joined to entities with a dashed line, not representing a relationship, just the properties that belong to said entity.

22
Q

(Functional programming and Big Data (FPBD) when do we use functional programming in big data?

A

When working with data which needs to be distributed over multiple servers (volume).

23
Q

(FPBD) does functional programming have side effects?

A

No, it will not change any values or affect the program elsewhere.

24
Q

(FPBD) What is stateless news?

A

When the current state of the variable, regardless of the order call of functions, does not rely on variables from other function.

25
Q

(FPBD) why do we use stateless news?

A

It is easier to write correct code so we can predict the behaviour of the program.

26
Q

(FPBD) What is a benefit of functional programming.

A

Supports higher order functions - takes one or more functions as an input and outputs a function.

27
Q

(FPBD) What does FPBD not allow for?

A

Variable assignments - a created variable cannot be modified and is an immutable object.

28
Q

(FPBD) What is a benefit of variable assignment?

A

Makes parallel processing across multiple servers easier.