Chapter 1: Machine Learning for Predictive Analysis Flashcards

1
Q

What is the job of data analytics?

A

Extracting insights from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is predictive data analytics?

A

The art of building and using models that make predictions based on patterns extracted from historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the applications of predictive data analysis?

A

price prediction (businesses), dosage prediction (doctors), risk assessment (organizations), propensity modeling (predicting the likelihood or propensity of individuals to take different actions), diagnosis (doctors, engineers and scientists), document classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a prediction in data analytics? How is it different from the everyday usage?

A

In DA a prediction is the assignment of a value to any unknown variable. In everyday usage it has a temporal aspect, we predict what will happen in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What two things are common in all the application examples?

A

in each case, a model is used to make a prediction to help make a decision AND a model is trained to make predictions based on a set of historical examples (machine learning is used to train these models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is machine learning?

A

Machine learning is an automated process that extracts patterns from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is supervised machine learning used for?

A

We use supervised machine learning to build the models used in predictive data analytics applications
- They have labels/ classes/ events that provide us with feedback while learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do they work?

A

They automatically learn a model of the relationship between a set of descriptive features and a target feature based on a set of historical examples (instances)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is each row of a dataset called?

A

training instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the overall dataset called?

A

training dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is a model consistent?

A

when there are no instances in the dataset for which the model does not make a correct prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do machine learning algorithms do?

A

automate the process of learning a model that captures
the relationship between the descriptive features and the target feature in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is searching for consistent models not enough to learn useful prediction models?

A
  • When dealing with large databases there will likely be noise
  • The training set represents only a small sample of the possible set of instances in the domain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an ill-posed problem?

A

An ill-posed problem is a problem for which a
unique solution cannot be determined using only the information that is available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is machine learning an ill-posed problem?

A

A single consistent model cannot be found based on the sample training dataset alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is generalization?

A
  • The ability to make predictions for queries that are not present in the data
  • A prediction model that makes the correct predictions for these
    queries captures the underlying relationship between the descriptive and target features
    and is said to generalize well
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the goal of machine learning?

A

Finding the predictive model that generalizes best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is inductive bias?

A

A set of assumptions that defines the model selection criteria of a machine learning algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the types of inductive bias?

A
  • Restriction bias
    -Preference bias
20
Q

What is restriction bias?

A
  • It constrains the set of models that the algorithm will
    consider during the learning process
  • Similar to choosing your go-to study method
  • Tells us what our model is able to represent
21
Q

What is preference bias?

A

-It guides the learning algorithm to
prefer certain models over others
- Choosing out convergence/satisfaction mechanism
- I like group study but prefer to be the leader than weakest link
- Algorithm’s belief about what makes a good hypothesis

22
Q

What are examples of restriction bias?-

A
  • In multivariable linear regression with gradient descent we only consider models that produce description based on a linear combination of the descriptive features
  • In Iterative Dichotomizer 3 we only consider tree-like prediction models where each branch encodes a sequence of checks on individual descriptive features
23
Q

What are examples of preference bias?

A
  • In MLR with GD we linearly combine the descriptive features using only weights that were found though our gradient descent approach
  • In ID3 we are preferring shallower (less complex) trees over larger/deeper trees
24
Q

Why is inductive bias necessary for learning beyond the dataset?

A

Without it we could only perform memorization of our training dataset without generalization capacity

25
Q

What is model induction?

A

The creation of models from data

26
Q

What is the difference between classification problem and regression problem?

A
  • Classification problem has the target as a category
  • Regression problem has the target as a number
27
Q

What is another name for dataset?

A

-One whose form is the same is a table/relation of a database
- Worksheet of a spreadsheet
- Array in math

28
Q

What is an instance

A
  • Row, tuple, record of a database table
  • Case in statistics
  • Object of a class in programming
  • Datapoint or vector in math
29
Q

What is an independent variable?

A
  • It is the attribute supplied as input
  • Also known as explanatory variable, inputs, predictors
    -Features are the table’s columns
30
Q

What is dependent variable

A

The target variable whose values are to be predicted.
Aka class or label or output

31
Q

What are some confusing facts about independent and dependent variables?

A
  • Independent variables may not be independent on each other or anything else
  • Dependent variables does not always depend of all the independent variables
32
Q

Facts about the target variable

A
  • Sometimes it is considered to be included in the set of features, sometimes it is not
  • The target variable is not used to predict itself
  • Prior values may be helpful to predict future values and may be included as input features
33
Q

What is the process of building a model (or training your classifier) from historical data?

A

Induction, learning, training or generalization

34
Q

When does the real value of machine learning become apparent?

A

When we want to build prediction models from large datasets with multiple features

35
Q

How do you know the number of possible prediction models?

A
  • There are three descriptive features so there are 2^3 possible combinations of descriptive feature values
  • For each descriptive feature there are 3 possible target feature values
  • There are 3^8 = 6,561 possible prediction models
36
Q

What is the ability to memorize a training dataset?

A

Consistency

37
Q

What does Occam’s Razor say about simplicity?

A

With all things being equal the simplest explanation tends to be the right one (upper bound)

38
Q

What does albert Einstein say about simplicity

A

Everything should be made as simple as possible but not simpler (lower bound)

39
Q

What are the sources of information that guide machine learning algorithms?

A
  • Training data
  • Inductive bias of the algorithm
40
Q

What can go wrong with machine learning?

A
  • Inappropriate inductive bias which leads to mistakes
41
Q

What does no free lunch mean?

A

if an algorithm does well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems

42
Q

What happens if we choose the wrong inductive bias?

A
  • Underfitting (the prediction model is oversimplifies)
    -Overfitting (the prediction model is so complex that it becomes too sensitive to noise in the data, it memorizes)
43
Q

What is a Goldilocks model?

A
  • A model that is just right and strikes a good balance between overfitting and underfitting
  • it is found by using algorithms with appropriate inductive biases
44
Q

What is CRISP-DM

A

Cross Industry Standard Process for Data Mining is a data mining process model that describes commonly used approaches that data mining experts use to tackle problems

45
Q

What are the phases?

A

-Business Understanding- defining customers’ needs, understanding project objectives
-Data Understanding- collection and data familiarity
-Data Preparation- construct final dataset from raw data
-Modeling- select machine learning techniques relevant to the problems and their parameters are calibrated to optimal values
-Evaluation- outcome collection, compare obtained model with business objectives
-Deployment- put into production, organize and present knowledge gain in a way that the customer can use it

46
Q

List other data life cycle models

A

-Semma- Sample, Explore, Modify, Model, Assess
- Data Mining and Knowledge Discovery from Data (KDD)- mostly used in the real world

47
Q

What is supervised machine learning based on?

A

The assumption that data does not change over time. They create models that distinguish between classes present in the dataset they are induced from