Big Data Architectures Flashcards

1
Q

What is big data?

A

A collection of large and complex data sets which are difficult to process using
common database management tools or traditional data processing
applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 Vs of Big data?

A

Volume -> Data at rest
Velocity -> Data in Motion
Variety -> Data in many forms
Veracity -> Data in doubt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two types of sclaling? (ability of the system to adapt to increased demands)

A
  1. Horizontal scaling:
    - -> distribute workload across many servers by adding multiple machines to improve processing capacity
  2. Vertical scaling
    - -> involves installing more processors, more memory and faster hardware typically within a single server (make it bigger instead of more)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the advantages and disadvantages of horizontal scaling?

A

Advantages:
- increases performance in small steps as needed
- financial investment is relatively small
can scale up as much as needed

Disadvantages:

  • Software has to handle all the data distribution
  • There are only a limited number of softwares available than can take advantage of horizontal scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages and disadvantages of vertical scaling?

A

Advantages:
- Most softwares can easily take advantage of vertical scaling
- easy to install hardware within a single machine
Disadvantages:
- requires substantial financial investment
- system has to be more powerful to handle future workloads
- does not necessarily scale up vertically after a certain limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are horizontal scaling platforms?

A

Peer to peer networks

apache hadoop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are vertical scaling platforms?

A

Multicore processors
HPC high performance computing clusters
Graphics processing units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a peer to peer network?

A
  • typically involves millions of machines connected in a network
  • decentralized and distributes network architecture
  • message passing interface
    -each node is capacle of storing and processing data
    scale is practically unlimited

drawbacks:
- communication is a major bottleneck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is apache hadoop?

A

an open source software for storing and processing large datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are high performance computing clusters? (HPC)

A

also known as super computers with throusands of processing cores

built powerful hardware optimized for speed and
throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are multicore CPUs?

A

One machine having dozens of processing

cores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly