Big Data Flashcards

1
Q

What are the 5Vs of big data?

A

Veracity, Velocity, Volume, Variety, Volatility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is veracity in big data?

A

Data quality and origin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is velocity in big data?

A

Data being generated extremely fast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is volume in big data?

A

Vast amounts of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is variety in big data?

A

Data comes from different sources, in different forms (structured and unstructured).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is volatility in big data?

A

Volatile (very high and then very low) data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is RDBMS going away? Why or why not?

A

No. Orgs store different types of data in different ways, according to their strengths (polyglot coexistence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Hadoop?

A

Java-based framework for distributing and processing very large data sets across clusters of computers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some important components of Hadoop?

A

HDFS and MapReduce.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is HDFS?

A

Hadoop Distributed File System.
- Low-level distributed file processing system to store data
- Designed to run on commodity (cheap) hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is MapReduce?

A

Programming model that supports processing large
data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of HDFS?

A
  • High volume: 64MB block size
  • Write once, read many: concurrency control
  • Streaming access: batch process of file
  • Fault tolerant: replicate data across different nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of HDFS nodes?

A
  • Data node
  • Name node
  • Client node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is HDFS data node?

A
  • Store actual file data
  • Block creation, deletion & replication
  • Sends heartbeat to name node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is HDFS name node?

A
  • One per cluster
  • Containts file system metadata (filename, block #, r factor)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is HDFS client node?

A
  • make request to file system
  • to read file: contact name node to get blocks & data nodes
17
Q

More about MapReduce

A