1.3.2 Introduction to Data Science - Statistical Learning Flashcards

1
Q

A. Explain why we estimate a function with data, including the role of input and output variables and their synonyms

A

Input variables are also known as independent variables or predictors, output variables are also known as the response or dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

B. Explain various error terms (reducible and irreducible), the expected value of error squared, and the variance of error terms.

A

Reducible error : predicted function is not a perfect estimation of f
Irreducible error: even with a perfect estimate of f, Y is still a function of the error term

Focus of statistics is estimating f and reducing reducible error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

C. Compare and contrast parametric and non-parametric learning methods.

A

In the parametric approach you first estimate a functional form of f and then apply data to train/fit the model (e.g. linear model and OLS).

In the non parametric approach does not assume a functional form of f (needs many observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

D. Describe the trade-offs between prediction accuracy, flexibility, and model interpretability, including the role of overfitting.

A

Flexibility: flexible models are more difficult to interpret and more complex flexible models may lead to overfitting the data.

Restrictive models are easier to interpret but generate smaller range of shapes of f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain when a supervised learning model is preferable to unsupervised or semi-supervised learning models.

A

Supervised learning is preferable when trying to predict relationships between dependent and independent variables.
Unsupervised learning is prefered when there is no observed respons between the dependent and independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how the appropriateness of regression problems relative to classification problems may be related to whether responses are quantitative or qualitative.

A

Regression problems have a quantitative response, classification problems have a qualitative response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly