4.11 - Big Data Flashcards

Question 1

Q

What is Big Data in one sentence?

Answer

A

A catch-all term for data sets so large or complex that they no longer fit into the usual storage or processing “containers.”

Question 2

Q

Why is volume a challenge in Big Data?

Answer

A

The data is too big to fit on a single server.

Question 3

Q

What does velocity refer to in Big Data?

Answer

A

Data arrives as a stream that may need millisecond-to-second responses.

Question 4

Q

What does variety mean in the context of Big Data?

Answer

A

The data comes in many forms—structured, unstructured, text, images, audio, video, etc.

Question 5

Q

What is usually the hardest aspect of Big Data and why?

Answer

A

Its lack of structure; without rows-and-columns it is much harder to analyse and cannot be stored efficiently in relational databases.

Question 6

Q

Why don’t traditional relational databases suit Big Data?

Answer

A

They require tidy row-and-column structure and don’t scale well over multiple machines.

Question 7

Q

How do we extract patterns and useful information from unstructured Big Data?

Answer

A

By applying machine-learning techniques.

Question 8

Q

When does size become the real issue in Big Data?

Answer

A

Once the data set no longer fits on a single server and relational databases won’t scale horizontally.

Question 9

Q

What must happen when data no longer fits on one machine?

Answer

A

Processing has to be distributed (shared) across multiple machines.

Question 10

Q

Why is functional programming useful for distributed Big Data processing?

Answer

A

Its style makes it easier to write correct and efficient code that can be spread across servers.

Question 11

Q

How do immutable data structures help in distributed systems?

Answer

A

No in-place updates → no race conditions, so parallel code stays correct.

Question 12

Q

What does statelessness mean, and why is it good for distribution?

Answer

A

Functions depend only on their inputs, not shared state → easy to reproduce work on any node.

Question 13

Q

What are higher-order functions, and why are they handy for Big Data?

Answer

A

Functions that take/return other functions (e.g., map, filter) let you express large-scale data transformations concisely.

Question 14

Q

In a fact-based model, what is a fact?

Answer

A

A single, atomic piece of information.

Question 15

Q

What three parts make up a graph schema?

Answer

A

Nodes (entities), edges (relationships), and properties (attributes on nodes/edges).

Question 16

Q

Why use a graph schema for Big Data?

Answer

Study These Flashcards

A

It captures the structure of a large, often irregular data set in a way that can be traversed and queried efficiently.

Question 17

Q

state two features of functional programimg languages that make it easier to write code that can be distributed to run accorss more than one server

Answer

Study These Flashcards

A

Immutable data structures // the state of a data structure cannot be changed
Statelessness // functions do not have side-effects // all functions are pure
Functions can be distributed to servers and executed on data sets then the results can be combined // map-reduce

4.11 - Big Data Flashcards

(17 cards)