L4: Foundation of ML and Linear Regression Flashcards

1
Q

Which of the following statements hold true for supervised ML? (Select all correct)
A) the algorithm is trained on a labeled dataset, where the input data is associated with corresponding output labels
B) The goal is for the algorithm to learn a mapping or relationship between the input features and the output labels
C) logistic regression is a form of supervised learning
D) decision trees is a form of supervised learning
E) linear regression is a form of supervised leaning

A

In supervised learning, all hold true

A) the algorithm is trained on a labeled dataset, where the input data is associated with corresponding output labels
B) The goal is for the algorithm to learn a mapping or relationship between the input features and the output labels
C) logistic regression is a form of supervised learning
D) decision trees is a form of supervised learning
E) linear regression is a form of supervised leaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the case of logistic regression, supervised learning is used for binary classification problems, where the output variable is categorical (e.g., 0 or 1). The LR model estimates the probability that a given input belongs to a particular class and makes predictions based on these probabilities.

TRUE/ FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In the context of decision trees, the algorithm learns (with supervised learning) a set of hierarchical decision rules based on the features of the data to make predictions about the output label.

TRUE/ FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsupervised learning involves NOT…
Select all wrong statements:

A) training a model on an labelled dataset
B) the algorithm explores the inherent structure or patterns within the data without explicit guidance
C) Common tasks include clustering, dimensionality reduction, and density estimation
D) The goal is to discover relationships or groupings in the data without predefined output labels

A

Wrong:
A) unsupervised learning involves training a model on an labelled dataset

the model is trained on an un-labelled dataset, in contrary to supervised dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following statements are true for reinforcement learning?

A) a learning agent evolves behaviors to solve tasks using evaluative feedback
B) the agent is punished or rewarded as consequence of its actions
C) it requires interactions between agent and environment
D) the agent learns through experience (trial/error)

A

All statements are true:

A) a learning agent evolves behaviors to solve tasks using evaluative feedback
B) the agent is punished or rewarded as consequence of its actions
C) it requires interactions between agent and environment
D) the agent learns through experience (trial/error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In linear regression, which measure tells us how good the prediction is?

A

Mean squarred error (MSE) is used as the loss function in linear regression, telling us how good the prediction is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the limitations of ML?

Example/ hint: if the temperature measurer breaks down for one day, you must find a way to take this into account

A

There are many areas where things can go wrong in ML, leading the machine to learn something wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When using ML to distinguish between blueberry muffins and chihuahuas, the “chihuahua” label is stratistically associatted with big eyes, small bodies, pointy ears, etc.

What are the potential limitations on ML in this context?

A

The ML algorithm sees the world as a set of pixel values. But muffins and chihuahuas have similar pixel values…

The same issue might also be present when distinguishing between chihuahuas and other dog breeds with similar attributes.

Limitation: it can be wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When using ML to distinguish between blueberry muffins and chihuahuas, it poses limitations due to the attributes of muffins and chihuahuas being similar. What is a solution to this?

A

You want to define features that specifically discriminate and thus separate between doubt objects:
- distance between eyes
- max three dots
- max x kilo
- smell and taste data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Once a rule/ prediction is established, re-testing and re-training is necessary with new data to account for:

A) monoticity
B) changes in rules
C) veloroticity

A

B) changes in rules

I.e., when the future becomes different from past, retraining and retesting is necessary to maintain good predictive performance on future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the cross-validation trade-off?

A

Cross-validation trade-off: if you use part of the data to test the rule made based on the remainder of the data, this allows the model to be tested out. However, this also means that less data is used for building the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cross-validation?

A

Cross validation entails splitting your data into one part for training and the held-out set for testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the goal of cross-validation?

A

To avoid or mitigate overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following statements are true about bias in the bias-variance tradeoff in ML?
A) Bias refers to the error introduced by approximating a real-world problem with a simplified model
B) High bias models may oversimplify the underlying patterns in the data and lead to systematic errors.
C) Low bias is often associated with underfitting
D) Bias is the error introduced by using a complex model that is highly responsive to the training data

A

A) Bias refers to the error introduced by approximating a real-world problem with a simplified model
B) High bias models may oversimplify the underlying patterns in the data and lead to systematic errors.

WRONG:
C) Low bias is often associated with underfitting –> HIGH bias has this problem

D) Bias is the error introduced by using a complex model that is highly responsive to the training data –> this is the case for high variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the bias-variance tradeoff in ML, which of the following statements are FALSE about variance? (Select all wrong answers)

A) variance is the error introduce by using a complex model highly responsive to the training data
B) high variance models may capture noise or random fluctuations in the training data
C) high variance leads to poor generalisation on new, unseen data
D) High variance is often associated with overfitting

A

All options are true

A) variance is the error introduce by using a complex model highly responsive to the training data
B) high variance models may capture noise or random fluctuations in the training data
C) high variance leads to poor generalisation on new, unseen data
D) High variance is often associated with overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simpler models tend to have higher variance but lower bias – and vice-versa for complex models. But very complex models can result in overfitting.

TRUE/ FALSE?

A

FALSE. Actually true statement:

Simpler models tend to have higher BIAS but lower VARIANCE – and vice-versa for complex models. But very complex models can result in overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Predictive modeling works by leveraging correlations between feature values (input) and output. Ideally, our predictors would each contribute independent sources of information about the outcome, and this is often the case

TRUE/FALSE

A

FALSE

Instead:
Predictive modeling works by leveraging correlations between feature values (input) and output. Ideally, our predictors would each contribute independent sources of information about the outcome, but this often not the case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following statements are NOT true about co-linearity?
A) it occurs when two+ features are highly correlated with one another.
B) it can cause problems if doing statistical inference because if predictors increase or decrease together it can be hard to determine their separate effects the output
C) Generally the solution is to remove one of the redundant predictors
D) Since we are more interested in prediction and not in explanatory modeling, it is even more important to take into account

A

WRONG:
D) Since we are more interested in prediction and not in explanatory modeling, it is even more important to take into account

Instead: Since we are more interested in prediction and not in explanatory modeling, we won’t worry too much about the issue of (multi)co-linearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

You can make a correlation matrix to explore correlations between variables that are both numeric and categorical

TRUE/FALSE

A

FALSE
Correlation matrix, or correlation in general, can only be mapped for numeric variables, since it is not possible to see this relation for variables with a finite number of outcomes (e.g. factors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In linear regression, the intercept term alone tells you the expected response variable output if explanatory variables had a value of 0

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In linear regression, what does the F-statistics (output) measure?

A

The F-statistic roughly compares the fit of the given model incl. predictors with an intercept-only model/naive model.

A significant F-statistic tells us that, overall, at least one of the predictors has a coefficient not equal to 0 and is thus related to the output

22
Q

Which of the following statements are TRUE about t-statistics?
A) The t-statistic reports a test for the association between each predictor individually with the response variable
B) it can result in false discoveries of association due to multiple testing (some will by associated with the outcome purely by chance).
C) the t-statistic was not developed for today’s big data contexts where we might have dozens or hundreds of predictors

A

All options are true

23
Q

Neither the F or the t statistic was developed for today’s big data contexts where we might have dozens or hundreds of predictors.

TRUE/ FALSE

A

TRUE

24
Q

Linear assumption makes a list of simplistic assumptions. Which of the following are part of the assumptions?
A) normally distributed errors and the expected value of the error is 0
B) homoscedasticity (variance of errors is constant for all values of independent variables)
C) we are using the correct functional form (the true data generating model is a linear function)
D) each row’s values are independent draws from the same underlying distribution
E) there are no omitted relevant variables and no added irrelevant variables
F) no correlation between the error terms and independent variables
G) no autocorrelation among error terms produced by different values of the independent variables

A

All of the assumptions are true:

A) normally distributed errors and the expected value of the error is 0
B) homoscedasticity (variance of errors is constant for all values of independent variables)
C) we are using the correct functional form (the true data generating model is a linear function)
D) each row’s values are independent draws from the same underlying distribution
E) there are no omitted relevant variables and no added irrelevant variables
F) no correlation between the error terms and independent variables
G) no autocorrelation among error terms produced by different values of the independent variables

25
Q

Explain what “error” in linear regression, is

A

The error (residual) for each data point is the vertical distance between the observed value and the value predicted by the linear regression model. Mathematically, the error for an individual data point i is often denoted as epsilon and added lastly in the linear regression function

26
Q

Due to a number of simplistic assumptions that have to be satisfied in linear regression, the model typically exerts poor predictive performance

TRUE/FALSE

A

FALSE
it turns out that a linear model can still predict quite well even when some or all of these assumptions are violated!
So it’s always worth trying first, at least as a benchmark to compare against more complicated models.

27
Q

In linear regression, how is the best fitting line estimated?

A

The ordinary least squares (OLS) estimation procedure

28
Q

The ordinary least squares (OLS) seeks to find the line that minimizes the squared residuals between the line and the observations

TRUE/FALSE

A

TRUE

29
Q

In the context of the bias-variance tradeoff in linear regression, which of the following statements are NOT true?
A) simple linear models may exhibit bias by systematically missing out on subtleties in the relationship between features and output.
B) on the other hand, more complex models run the risk of overfitting
C) overfitting results in the model mistaking noise (random sampling variation) for relational structure
D) Usually there is a balance we seek to find between these two extremes
E) neural networks is one type of model prone to overfitting due to its complexity

A

Alle options are correct

30
Q

Why not simply use all the features available to predict the response variable?

A

To avoid overfitting!

31
Q

In general, in many business-related big data contexts, a linear model perhaps oversimplifies or underfits the underlying relationship.

TRUE/FALSE

A

TRUE

32
Q

In linear regression, ideally, residuals would be normally distributed. In many business contexts, however, there are different costs associated with over- and underprediction, so it can be useful to analyze prediction errors. Sometimes we might even prefer a model that on average performs worse than another, but tends to make fewer errors in the more costly direction.

TRUE/FALSE

A

TRUE

33
Q

In simple terms, multiple R-squared provides an indication of how well the independent variables in the model explain the variation in the dependent variable

TRUE/FALSE

A

TRUE

34
Q

How is the multiple R square calculated?
A) as the squared correlation between observed values of the dependent variable and the values predicted by the linear regression model
B) as the squared coefficient between observed values of the dependent variable and the values predicted by the linear regression model
C) as the squared error between observed values of the dependent variable and the values predicted by the linear regression model

A

A)
The multiple R squared is calculated as the squared correlation between the observed values of the dependent variable and the values predicted by the linear regression model

35
Q

When a linear model includes only a single predictor, R-squared is equal to the correlation coefficient r between the predictor and outcome
TRUE/FALSE

A

TRUE

36
Q

A multiple R-squared
statistic of 0.72
reveals what?
A) that the variation in response variable can account for roughly 72% of the variation in the error
B) that the variation in predictor variable can account for roughly 72% of the variation in response variable.

A

B) that the variation in predictor variable can account for roughly 72% of the variation in response variable.

37
Q

The presence of an interaction indicates that the effect of one predictor variable on the response variable is different at different values of the other predictor variable.

TRUE/FALSE

A

TRUE

38
Q

Since statistical significance is not a function of sample size, in “big data” contexts it is particularly useful.

TRUE/FALSE

A

FALSE
Instead:
Since statistical significance is a function of sample size, in “big data” contexts it is not so useful. Given enough data nearly any association, no matter how practically insignificant, can be statistically significant.

39
Q

Which of the following metrics in linear regression measures the proportion of variance in dependent variable explained by the entire set of independent variables, adjusted for number of predictors (more accurate measure of model fit)
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

B) Adj. R squared

40
Q

Which of the following metrics in linear regression measures the proportion of variance in dependent variable explained by the entire set of independent variables?
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

A) Multiple R squared

41
Q

Which of the following metrics in linear regression represents the standard deviation of the residuals, indicating the avg. amount that predicted values deviate from actual values
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

C) Std. Error

42
Q

Which of the following metrics in linear regression provides the estimated coefficients for each independent variable, indicating the strength and direction of their association with the dependent variable?
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

D) Estimate

43
Q

Which of the following metrics in linear regression indicates the significance level of each variable based on the p-value?

A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

G) Signif. codes

44
Q

Which of the following metrics in linear regression measures the number of standard deviations that the estimated coefficient is away from zero; helps assess the significance of each variable.

A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

E) T-value

45
Q

Which of the following metrics in linear regression indicates the number of independent values that can be assigned to a statistical distribution.

A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

I) Degrees of freedom

46
Q

Which of the following metrics in linear regression measures the overall significance of the regression model; assesses whether the model explains a significant amount of variance.
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

J) F-statistics

47
Q

Which of the following metrics in linear regression represents the probability that the observed F-statistic is greater than a critical value for significance? Lower values suggest greater significance of the overall model
A) Multiple R squared
B) Adj. R squared
C) Std. Error
D) Estimate
E) T-value
F) Pr(>|t|)
G) Signif. codes
H) Residual st. error
I) Degrees of freedom
J) F-statistics
K) P-value

A

K) P-value

48
Q

In summary, the “residual standard error” is specific to regression models and quantifies the average prediction error, while “standard error” is a more general term used to express the precision of a statistical estimate.

TRUE/FALSE

A

TRUE

49
Q

Residual standard error is used to assess the overall goodness of fit of a linear regression model

TRUE/FALSE

A

True

50
Q

Imagine you have 10 data points. If you are estimating a simple linear regression model (one independent variable), you would have ___ degrees of freedom for the residuals because you’ve used ____ degree of freedom to estimate the slope of the line.

Fill in the blanks:
A) 9, 1
B) 10, 0
C) 0, 10

A

Imagine you have 10 data points. If you are estimating a simple linear regression model (one independent variable), you would have 9 degrees of freedom for the residuals because you’ve used 1 degree of freedom to estimate the slope of the line.

51
Q

Imagine you have 10 data points. If you are estimating a linear regression model with two independent variables, you would have ___ degrees of freedom for the residuals because you’ve used ____ degree of freedom to estimate the slope of the line.
The remaining degrees of freedom represent the data points that are free to vary after estimating the parameters

Fill in the blanks

A

Imagine you have 10 data points. If you are estimating a linear regression model with two independent variables, you would have 8 degrees of freedom for the residuals because you’ve used 2 degree of freedom to estimate the slope of the line.