00 intro: General info Flashcards

1
Q

Random access

A

Random access refers to the ability to access any particular piece of data from a storage device directly, without the need to sequentially read through the entire storage. It allows for immediate retrieval of data from any location within the storage, regardless of the order in which the data is stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does “Big Data” look like?

A

CSV, TSV and JSONs files, web pages, graphs, twitter tweets, server access logs,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The 4 Big V’s of “Big Data”

A

Volume: Lots of data
Velocity: Changing / growing data
Variety: Heterogeneity of data
Verity: Correct / true or not?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scale-Out vs Scale-Up

A

Scale-Out:
use of hundreds, thousands small machines vs

Scale-Up:
a single, rather powerful server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if P = failures of a single machine during a certain period of time then probability of N machine at the same time?

A

P_n = 1 - ( 1 - P) ^ N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fallacies of Distributed Computing

A
  1. Reliablity of network
  2. Latency
  3. Bandwidth is infinite
  4. Security of network
  5. Topology does not change
  6. Administrator is only one user
  7. Transport cost is zero
  8. Homogeneous network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

name few Cloud Computing Platforms

A
  • Amazon Elastic Cloud 2 (EC2)
  • Microsoft Azure
  • Google Cloud Platform (GCP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is MapReduce?

A

Map Phase:
in a parallel and distributed manner stored in memory, that divided data and apply mapping function creating
key-value pairs

Reduce Phase:
key-value pairs grouped based on their keys, creating aggregates, summarizes, or other computation.
The output of the reduce tasks is typically written to a file or storage system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Apache Hadoop

A

is a popular open-source implementation of the MapReduce model, providing a scalable and reliable framework for distributed data processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is Spark

A

In addition to simple MapReduce operations, Spark supports SQL
queries, streaming data, and complex analytics such as machine
learning and graph algorithms out-of-the-box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly