Regression Flashcards

1
Q

When would you use regression?

A

When considering relationships between a continuous predictor variable and a continuous response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you plot Least Squares Regression?

A

1) Plot a point at coordinate (mean of x, mean of y)

2) The best fit line is the line that minimises the squared deviations of data points from the line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the equation of a line and an alternative notation?

A

1) y=mx+c
2) y= A0 + A1x

(Where A0 is c, A1 is the gradient, and x is x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is another term for the gradient?

A

Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

1) What does Pearson’s correlation coefficient range from?

2) what does a Pearson’s coefficient of -1 or +1 mean

A

1) Ranges from -1 to 1

2) -1 is a perfect linear negative correlation
+1 is a perfect linear positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Pearson’s correlation coefficient assume?

A

Assumes correlation must be linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a Pearson’s correlation coefficient of 0 indicate?

A

There is absolutely no relationship between x and y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Spearmans rank?

What is it used for?

A

1) This is a non parametric correlation coefficient which doesn’t assume correlation is linear.
2) Is used to look at monotonic correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is a Spearman’s rank calculated?

A

The raw x and y data is converted into ranks. It the correlation is monotonic the ranks will appear as a perfect linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Spearman’s rank compared to Pearson’s correlation coefficient?

A

Spearman’s rank is simply the Pearson’s correlation coefficient of the ranked data as opposed to the raw data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can p-values be associated with correlation coefficients?

A

Yes and they would indicate if the correlation is significantly different from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 3 types of general linear models?

A

1) ANOVA
2) ANCOVA
3) Linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the form of a General linear Model?

A

Y= A0 + A1x + A2x + (B1 or B2 or -B1-B2) + E

Where: A0 is a constant

A1 is the gradient of predictor variable 1

A2 is the gradient of predictor variable 2

(B1 or B2 or -B1-B2) is the effects of categorical predictor variables

E is the error which is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What determines significance in General linear models?

A

F ratios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is R^2?

A

This is how much variation in the data/model have we explained.

1- (residual sum of squares/total sum of squares)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the residual sum of squares / Total s of squares?

A

Proportion of variation that hasn’t been explained

17
Q

How can you use the Pearson’s coefficient to find R^2?

A

R^2 = the Pearson’s correlation coefficient(r)^2

18
Q

What is Simpsons paradox?

A

This is when you come to the wrong conclusion because potential lurking variables haven’t been taken into account.

19
Q

What is interpolation?

A

Predicting values of the response variable within a zone of measured values.

20
Q

What is extrapolation?

A

Predicting values of the response variable outside the zone of measured values.

21
Q

What can be used if a relationship isn’t linear?

A

1) Linear regression using polynomial explanatory variables

2) Non linear regression

22
Q

What is an example of non linear regression?

A

Random forest regression

23
Q

What is random forest regression?

A

This is a forest of decision trees. The trees are built on training data you provide the algorithm.

Randomness comes from building lots of trees only based in a subset of the data that it randomly samples each time.

The decision trees are used to make predictions and the average prediction of the forest of decision trees is used to fit the regression line.

24
Q

What are advantages/ disadvantages of random forest regression?

A

Advantages: Based entirely on the data it has and therefore we cannot impose any of our ideas for the nature of the relationship.

Disadvantages: 1) can be slow
2) can sometimes overfit the data