Big Data Flashcards Preview

Paper 2 - Computer Science > Big Data > Flashcards

Flashcards in Big Data Deck (10)
Loading flashcards...

What is Big Data?

Large data sets that are too difficult to store on one server and too varied and complex to easily analyse


What are the three qualities of Big Data?

Volume, velocity and variety


What does volume mean in Big Data?

The gathering/storing of large amounts of data


What does velocity mean in Big Data?

Data streams are collected in a near-to-real-time fashion making processing the data challenging.


What does variety mean in Big Data?

Data comes in a wide variety of formats e.g. text, video, audio, image and unstructured or structured


What is structured data?

- Data that can be entered into a relational database in a row and column format
- Data that can be analysed and queried


What is unstructured data?

Data that is:
- Difficult to organise
- Not appropriate to store in a database in a row and column format
- Comes in a vast range of formats so is difficult to perform data analysis on.


When is distributed programming used?

When data is too big to be processed on a single machine, the processing is distributed across several machines.


What is a computer cluster?

Used in distributed programming to share the big data processing task.

Big Data --> Computer Cluster (Master Computer and computers) --> Client machines


What does the master computer do?

It uses specialist software to control each networked computer as they perform their sub-tasks