Week 12 Flashcards

1
Q

Big Data - Name this member of the Five Vs:

the vast amount of data that is generated every
second/minute/hour/day in the digitized world

A

Volume

Examples: Online transcations (banking), sensors like GPS, accelerometer, facebook & twitter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big Data - Name this member of the Five Vs:

refers to the speed at which data is being
generated and the pace at which data moves from
one point to the next

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big Data - Name this member of the Five Vs:

refers to the ever-increasing different forms of data
that can come in.

Brings challenges in terms of data integration,
transformation, processing and storage

A

Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Big Data - Name this member of the Five Vs: refers to the quality of the data, which can vary
greatly. Lack of this could mean there is noise that needs to removed

A

Veracity

Noise means meaningless/corrupt/distorted data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Big Data - Name this member of the Five Vs: Refers to the usefulness of data for
an enterprise

A

Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The longer it takes data to be turned into meaningful info, the less value is has for the business. This means _____ and ____ are inversely related

A

value and time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

This is a tightly coupled collection of servers or nodes. These servers usually have the same hardware and are connected together on a network, and act as a single unit.

A

Cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F - Each node in the cluster shares it resources

A

False. Each node has its own dedicated resources (memory, processor, hdd)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F - A cluster can execute a task by splitting it into small pieces and distributing those pieces to different computers in the cluster

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T/F - A file system provides a logical view of data, sorting it into a tree structure

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T/F - Distributed file systems can appear local to the client

A

True (logically, physically they’re not local)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distributed file systems store large _____ spread across nodes of a ______

A

files spread across nodes of a cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

This is the process of horizontally partitioning a large dataset into a collection of smaller, more manageable dataset

A

Sharding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Shards are distributed across multiple _____, which in this context are ______s or ________s

A

nodes; servers, machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F - In sharding, each shard is stored on the same node

A

False, they’re stored on separate nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F - In sharding, each node is responsible for only the data stored on it

A

True

17
Q

T/F - Each shard shares the same schema, and together collectively represent the complete dataset

A

True

18
Q

This allows the distribution of processing loads across multiple nodes, to achieve horizontal scalability

A

Sharding

19
Q

What is horizontal scaling in relation to system capacity?

A

Adding similar or higher capacity resources alongside existing resources, to increase system capacity

20
Q
A