1.3.2 Introduction to Data Science - Statistical Learning Flashcards
A. Explain why we estimate a function with data, including the role of input and output variables and their synonyms
Input variables are also known as independent variables or predictors, output variables are also known as the response or dependent variable.
B. Explain various error terms (reducible and irreducible), the expected value of error squared, and the variance of error terms.
Reducible error : predicted function is not a perfect estimation of f
Irreducible error: even with a perfect estimate of f, Y is still a function of the error term
Focus of statistics is estimating f and reducing reducible error
C. Compare and contrast parametric and non-parametric learning methods.
In the parametric approach you first estimate a functional form of f and then apply data to train/fit the model (e.g. linear model and OLS).
In the non parametric approach does not assume a functional form of f (needs many observations)
D. Describe the trade-offs between prediction accuracy, flexibility, and model interpretability, including the role of overfitting.
Flexibility: flexible models are more difficult to interpret and more complex flexible models may lead to overfitting the data.
Restrictive models are easier to interpret but generate smaller range of shapes of f
Explain when a supervised learning model is preferable to unsupervised or semi-supervised learning models.
Supervised learning is preferable when trying to predict relationships between dependent and independent variables.
Unsupervised learning is prefered when there is no observed respons between the dependent and independent variables.
Explain how the appropriateness of regression problems relative to classification problems may be related to whether responses are quantitative or qualitative.
Regression problems have a quantitative response, classification problems have a qualitative response