01. Introduction to Big Data Analytics Flashcards

1
Q

Name the three main attributes which define big data characteristics

A

Huge volumes of data

Complexity of data types and structures

Speed of new data creation and growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of big data

A

Big data is data whose scale, distribution, diversity, and/or timeliness requires the use of new technical architectures and analytics to enable insights that unlock new sources of business value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is driving the data deluge

A
Mobile sensors
Social media
Video surveillance 
Video rendering
Smart grids
Geophysical exploration
Medical imaging
Gene sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is structured data

A

Data containing a defined data type, format, and structure (that is, transactional data, online analytical processing [OLAP] data cubes, traditional RDBMS, CSV files, and even simple spread-sheets).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is semi structured data

A

Textual data files with a discernible pattern that enables parsing (such as Extensible Markup Language [XML] data files that are self-describing and defined by an XML schema).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is quasi-structured data

A

Textual data with erratic data formats that can be formatted with effort, tools, and time (for instance, web clickstream data that may contain inconsistencies in data values and formats).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is unstructured data

A

Data that has no inherent structure, which may include text documents, PDFs, images, and video.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the four Vs of big data

A

Volume (amount)
Velocity (speed)
Variety (types of data)
Veracity (accuracy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a data repository

A

A data repository is a general term used to refer to a destination designated for data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name the five main skill sets of a Data Scientist

A

Quantitative Skill - maths or stats
Technical Aptitude - software, programming and machine learning
Sceptical/Critical thinking - examine their own work
Curious/Creative - passionate about solving problems
Communicative and collaborative - work with the business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between BI and data science

A

BI presents insight on the past by way of Dashboards

Data Science is trying to predict the future and is capable of dealing with a wider variety of data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the characteristics of supervised machine learning

A

It uses historical data (a training data set) to build a model which allows it to predict future data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What should you use supervised machine learning to do

A

To make a prediction of a continuous variable or for classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is machine learning

A

Machine learning is an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is unsupervised machine learning

A

Unsupervised learning algorithms are used when the information used to train is neither classified nor labelled. The system doesn’t figure out the right output, but it explores the data and can draw inferences from data sets to describe hidden structures from unlabelled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of numerical data

A

Discrete and continuous. Discrete are integers and continuous have all the decimal places necessary

17
Q

What are the types of categorical data

A

Ordinal - Data categories that have an inherent order to them
Nominal - Data labels, hence categories with no inherent order
Binomial - “binary” data. success/failure, right/wrong