Chapter 2 Flashcards

(49 cards)

1
Q

Data science is a concept used to tackle big data and includes data cleansing, preparation, processing, and ______.

A

analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data scientist gathers data from multiple sources and applies ______ ______ to extract critical information from the collected data sets.

A

machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The key idea is to convert real-world problem into well-defined ______ ______ problem, so that it can be solved using machine learning.

A

data science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data mining is the process of analyzing vast amounts of data from various sources to extract ______ ______.

A

useful information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data mining is done through the discovery of previously unknown patterns, correlations, and ______, which can then be used to predict future outcomes.

A

anomalies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In binary classification, there are only ______ possible classes (or labels) for each instance.

A

two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An example of binary classification is ______ ______ where an email is either spam or not spam.

A

Spam detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In multiclass classification, there are more than ______ possible classes for each instance.

A

two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

An example of multiclass classification is ______ ______ where handwritten digits (0-9) need to be classified into one of the ten classes.

A

Digit recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The key difference between binary and multiclass classification regarding the number of classes is that binary has 2 classes, while multiclass has ______ or more.

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In multiclass classification, the output is a ______ of ______, with each probability corresponding to a different class.

A

vector,probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Supervised learning discovers patterns in the data that relate data attributes with a ______ (______) attribute.

A

target,class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In supervised learning, each data point in the training set has an associated ______ or ______.

A

label,output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Two examples of tasks in supervised learning are ______ and ______.

A

Classification,Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unsupervised learning involves training a model on a dataset without any ______ ______.

A

labeled responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The goal of unsupervised learning is to discover underlying ______, ______, or ______ in the data.

A

patterns,structures,relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Two examples of tasks in unsupervised learning are ______ and ______ ______.

A

Clustering,Dimensionality Reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Supervised learning has a ______ mechanism, while unsupervised learning does not.

A

feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Supervised learning is used for ______, while unsupervised learning is used for ______.

A

prediction,analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Algorithms for supervised learning include decision trees, logistic regressions, and ______ ______ ______.

A

support vector machine

21
Q

Algorithms for unsupervised learning include k-means clustering, hierarchical clustering, and ______ ______.

A

apriori algorithm

22
Q

A ______ Problem in data science asks ‘How much? How many?’

23
Q

A ______ Problem in data science asks ‘Is it type A or type B or type C?’

A

Classification

24
Q

A ______ Problem in data science asks ‘How is the data organized?’

25
An ______ ______ Problem in data science asks 'Is it a weird behavior?'
Anomaly Detection
26
A ______-______ and ______ Problem in data science asks 'What should be done next?'
Decision-making,Planning
27
In a regression problem like predicting game sales, the model could be represented as: Sales = f (______).
Variables
28
In a classification problem like digit recognition, the model tries to find the Probability of being a digit in terms of other Variables, like ______ ______.
pixel values
29
In a decision-making and planning problem, the aim is to model a ______/______ Function and try to Maximize/Minimize it.
Profit/Loss
30
Clustering groups data instances that are ______ to (near) each other in one cluster and data instances that are very ______ (far away) from each other into different clusters.
similar,different
31
Clustering is often called an ______ ______ task as no class values denoting an a priori grouping of the data instances are given.
unsupervised learning
32
Association rule mining is also considered an ______ learning task.
unsupervised
33
An example of clustering in real-life is grouping people of similar sizes together to make 'small', 'medium' and 'large' ______.
T-Shirts
34
In marketing, clustering is used to ______ customers according to their similarities to do targeted marketing.
segment
35
Two main types of clustering algorithms are ______ clustering and ______ clustering.
Partitional,Hierarchical
36
Clustering quality is assessed by maximizing ______-______ distance and minimizing ______-______ distance.
Inter-clusters,Intra-clusters
37
Anomaly detection problems try to find ______ of the Data compared to the Regular Pattern observed through the data model.
Deviations
38
Outliers are the set of data points that are ______ ______ than the remainder of the data.
considerably different
39
Bad data quality due to anomalies can lead to bad statistical tests, dashboards, machine learning models, and ultimately, a compromised foundation for ______ ______-______.
informed decision-making
40
The five main types of data science problems are well-posed using T (Type of data science problem), P (How well the problem is answered), and E (______ ______).
Available Data
41
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, ______ with experience E.
improves
42
To solve a real-world problem using machine learning, we need to identify: T (task), P (______ ______), and E (training experience).
performance measure
43
For a handwriting recognition learning problem, the Task T is recognizing handwritten words, Performance measure P could be the ______ of words correctly classified, and Training experience E would be a ______ of handwritten words with known classifications.
percentage,database
44
For a robot driving learning problem, the Task T is driving on public highways using vision sensors, Performance measure P could be the average ______ traveled before an error, and Training experience E would be a sequence of ______ and ______ commands recorded while observing a human driver.
distance,images,steering
45
To quote the 'optimal' sale price for a car using machine learning, this is a ______ Problem.
Regression
46
To determine how many types of flowers are there in Malaysia using machine learning, this is a ______ Problem.
Clustering
47
Scikit-Learn implements many popular machine learning methods, data processing ______ and visualizations.
pipeline
48
TensorFlow is an open-source ______ ______ Library.
Deep Learning
49
A well-defined learning problem requires a well-specified task, ______ ______, and source of ______ ______.
performance metric,training experience