L8; BIG DATA SYSTEMS Flashcards

1
Q

two types of big data scaling

A

to handle increased data in a similar time frame you need to scale computing power.
1. Vertical Scaling;
install more processors, memory and better/ faster hardware in a single machine

  1. Horizontal Scaling
    spread workload across many machines.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hadoop Horizontal scaling

A

framework of open-source tools for supporting the examination of data sets that are too large to fit into a traditional data warehouse ot relational database through reliable, scalable and distributed computing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Spark

A

is a new tool(2010) that can run directly on Hadoop Distributed File System ( HDFS ), inside MapReduce and alongside MapReduce on the same cluster.
unique feature is ability to perform in-memory computations.
it allows data to be catched in memory, thus eliminating Hadoop’s hard disk over head limitation for iterative tasks.
100 times faster than MapReduce when the data can fit in the memory and up to 10 times faster when data resides on the hard disk.
Spark supports streaming data and more complex analytics such as graph algorithms and machine learning such as spark SQL, machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cloud Computing

A

Cloud Computing allows for the use of remote vertical or horizontally scaled serves for data storage and analysis. (microsoft Azure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly