Data Mining - Lecture Regression Flashcards

1
Q

What are the steps in CRISPS-DM?

A
  1. Understand business problem
  2. Understand data
  3. Prepare data
  4. Model Building
  5. Testing and Evaluating
  6. Deployment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Student Number?

A

2064381

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between single and multiple linear regression?

A

Single only has one predictor (independent) variable.

Multiple has multiple predictor (independent) variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Ordinary Least Squares (OLS) method?

A

A method for estimating the unknown parameters in a linear regression.

-> More so, you are using it to determine the best plot line for your regression model based on the errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you calculate the OLS?

A

You calculate the error for every Yi that you have.

Yi is the actual observation for X.

The error for Yi is calculated by: Yi - Yhat

Yhat is the sample value, i.e. the model’s estimation for X.

OLS = SUM (Yi - Yhat)^2. You pick the model that has the smallest OLS.

Remember to compute and square each error béfore you add them up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which two uses are there for a regression model?

A
  1. Predictive
    Detect the outcome value for new records
  2. Explanatory/Descriptive
    Explaining the average effect of inputs on an outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is overfitting?

A

The goal about a model is to make good predictions about any additional data over which you run your algorithm.

If you have a function that represents your sample too perfectly, it does not take the ‘general’ relation between variables into account, just the ones from the sample. Therefore, it will not be able to predict future values well. This is overfitting.

-> Can be seen if the function in a graph is too close to the actual data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is underfitting of a model?

A

The model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly