Big Data Flashcards Preview

Paper 2 - Computer Science > Big Data > Flashcards

Flashcards in Big Data Deck (10)
Loading flashcards...
1

What is Big Data?

Large data sets that are too difficult to store on one server and too varied and complex to easily analyse

2

What are the three qualities of Big Data?

Volume, velocity and variety

3

What does volume mean in Big Data?

The gathering/storing of large amounts of data

4

What does velocity mean in Big Data?

Data streams are collected in a near-to-real-time fashion making processing the data challenging.

5

What does variety mean in Big Data?

Data comes in a wide variety of formats e.g. text, video, audio, image and unstructured or structured

6

What is structured data?

- Data that can be entered into a relational database in a row and column format
- Data that can be analysed and queried

7

What is unstructured data?

Data that is:
- Difficult to organise
- Not appropriate to store in a database in a row and column format
- Comes in a vast range of formats so is difficult to perform data analysis on.

8

When is distributed programming used?

When data is too big to be processed on a single machine, the processing is distributed across several machines.

9

What is a computer cluster?

Used in distributed programming to share the big data processing task.

Big Data --> Computer Cluster (Master Computer and computers) --> Client machines

10

What does the master computer do?

It uses specialist software to control each networked computer as they perform their sub-tasks