Big Data Flashcards

1
Q

Big data

A

A termed coined to handle masses of data generated through the internet and derive business insight from it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where does all this data come from

A

Web browsing patterns
RFID
Internet of things
Social media
Smartphones
Biomedical devices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scaling up v Scaling out

A

Scaling up: migrating to a larger system with better CPUs/storage space (increases costs)

Scaling out: spread across several servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clustering

A

A cluster of low cost servers sharing the workload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Velocity

A

Rate at which new data enters the system and the rate at which data must be processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variety

A

Big data captures data in the form that it naturally exists in:
Structured, unstructured, semi-structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Structured data

A

Organised to fit into a predefined data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unstructured data

A

Not organised to fit into a predefined data model. Like videos, images, texts…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Semi-structured

A

Has elements of both structured and unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

HDFS and it’s assumption’s

A

Hadoop Distributed File System
Assumes
• files will be really big. (divides them into blocks)
• write only, read many. (Simplifies concurrency issues and improves overall throughput)
• streaming access
• fault tolerance (replication means processing can continue even if one replicate fails)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly