4.11 - Big Data Flashcards

(17 cards)

1
Q

What is Big Data in one sentence?

A

A catch-all term for data sets so large or complex that they no longer fit into the usual storage or processing “containers.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is volume a challenge in Big Data?

A

The data is too big to fit on a single server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does velocity refer to in Big Data?

A

Data arrives as a stream that may need millisecond-to-second responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does variety mean in the context of Big Data?

A

The data comes in many forms—structured, unstructured, text, images, audio, video, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is usually the hardest aspect of Big Data and why?

A

Its lack of structure; without rows-and-columns it is much harder to analyse and cannot be stored efficiently in relational databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why don’t traditional relational databases suit Big Data?

A

They require tidy row-and-column structure and don’t scale well over multiple machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we extract patterns and useful information from unstructured Big Data?

A

By applying machine-learning techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When does size become the real issue in Big Data?

A

Once the data set no longer fits on a single server and relational databases won’t scale horizontally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What must happen when data no longer fits on one machine?

A

Processing has to be distributed (shared) across multiple machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is functional programming useful for distributed Big Data processing?

A

Its style makes it easier to write correct and efficient code that can be spread across servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do immutable data structures help in distributed systems?

A

No in-place updates → no race conditions, so parallel code stays correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does statelessness mean, and why is it good for distribution?

A

Functions depend only on their inputs, not shared state → easy to reproduce work on any node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are higher-order functions, and why are they handy for Big Data?

A

Functions that take/return other functions (e.g., map, filter) let you express large-scale data transformations concisely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a fact-based model, what is a fact?

A

A single, atomic piece of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What three parts make up a graph schema?

A

Nodes (entities), edges (relationships), and properties (attributes on nodes/edges).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why use a graph schema for Big Data?

A

It captures the structure of a large, often irregular data set in a way that can be traversed and queried efficiently.

17
Q

state two features of functional programimg languages that make it easier to write code that can be distributed to run accorss more than one server

A
  • Immutable data structures // the state of a data structure cannot be changed
  • Statelessness // functions do not have side-effects // all functions are pure
  • Functions can be distributed to servers and executed on data sets then the results can be combined // map-reduce