Lecture 6 - Statistical Tests II: Linear Regression Flashcards by Charlie Davies

in what instance would we select a linear regression?

interested in association → interested in trend [where x is continuous] → “experiment” → y-continuous → y-normal → linear regression

How well did you know this?

Not at all

Perfectly

linear regression is a statistical model which shows:

the relationship between 2 continuous variables

How well did you know this?

Not at all

Perfectly

what questions should we be asking when we are choosing a statistical test?

(1) what type of response variable? [continuous, discrete/count, proportion, binary]

(2) what type of explanatory [continuous, discrete/count, proportion, binary, categorical]

(3) interested in differences or trends/relationships?

(4) paired or independent sample

(5) normal/normal distribution

How well did you know this?

Not at all

Perfectly

what type are variables are present when we select a chi-squared statistical test?

when we are dealing with two categorical variables (y-counts, x categorical)

How well did you know this?

Not at all

Perfectly

what variables must be present in order for us to carry out a linear regression statistical analysis?

for a linear regression we must both have a continuous X & Y variable

How well did you know this?

Not at all

Perfectly

gradient =

change in y / change in x

How well did you know this?

Not at all

Perfectly

what are the three stages when trying to calculate a linear regression?

(1) choose your model: linear / non-linear

(2) estimate the parameters of the model

(3) model fit: how well does the model describe our data

How well did you know this?

Not at all

Perfectly

what is a Y bar that is found horizontally across the span of a graph?

a Y-bar, indicated by a dotted line labeled with a Y with a line on top of it shows the mean value line in your data

How well did you know this?

Not at all

Perfectly

how do you calculate the total sum of squares?

total sum of squares is the sum of all the squared distances between your data points and the Y-Line

How well did you know this?

Not at all

Perfectly

what is the equation of a line and its units?

y [w/ a hat] = a +bx

where:
a = intercept
b = slope

How well did you know this?

Not at all

Perfectly

what is the error sum of squares (residuals)?

error sum of squares or residuals = the sum of all the distances between each individual data point and the line of best fit (y[w/ a hat] = a + bx)

How well did you know this?

Not at all

Perfectly

what must all lines of best fit pass through and what allows us to choose what line of best fit is the most appropriate one?

all lines of best fit need to go through the mean-line of Y & X

we select the best line when the unexplained variation in our response is the smallest - when our residuals are the smallest

How well did you know this?

Not at all

Perfectly

with regression, if the slope is positive or negative, what does this show about the relationship between the two variables?

if the slope is positive: the relationship between the variables is positive

if the slope is negative: the relationship between the variables is negative

How well did you know this?

Not at all

Perfectly

what happens to the total sum of squares, SST, if we add additional data points?

the value gets larger

How well did you know this?

Not at all

Perfectly

how to calculate mean sum of squares:

calculate mean variability = mean sum of squares (MS) = divide our total sums of squares by our sample size

mean sum of squares = sum of square deviations from the mean / degrees of freedom

How well did you know this?

Not at all

Perfectly

how do you construct and fill out an ANOVA table?

Study These Flashcards

source | SS |D.O.F| MS
regression | SSR| 1 | SSR
error | SSE| n-2 |S^2=SSE/N-2
total | SST| n-1 |

[for regression: you need an additional column called ‘F’ which is the statistic in which F = SSR / S^2

degrees of freedom regarding SST & SSE:

Study These Flashcards

SST requires estimation of 1 parameter (mean of Y) => n-1 degrees of freedom
SSE requires estimation of 2 parameters (mean of y, slope) => n-2 degrees of freedom

SSR + SSE =

Study These Flashcards

SST

F-distribution percentile (5%) command in R:

Study These Flashcards

qf(0.95,1,n-2)

what is F is larger than the critical value?

Study These Flashcards

this means that we must accept the alternative hypothesis and reject the null hypothesis - we infer that the probability that relationship is due to chance is <0.05

we are only allowed to add a trend line if:

Study These Flashcards

we are only allowed to finally add out trend line if we reject the null hypothesis that the slope = 0, if the slope is not significantly different from 0, we must not add a trend line

we are only allowed to carry out a linear regression if certain assumptions are fufilled:

Study These Flashcards

residuals must be normally distributed
the variance associated with the distribution of the the residuals ions constant (ie. variation in y does not increase with increasing x)
individual measurements are independent
data comes from a random sample

how can we test if our assumptions are met or violated when questioning wether we can write a linear regression?

Study These Flashcards

we can see if our assumptions are violated through using diagnostic plots in R

residuals vs fitted: in the first one we ask whether the variance is consistent or constant [we want it to look scattered like the sky at night]

normal Q-Q: int he second plot we check whether residuals are normally distributed and we want them to fall onto the regression line (just about)

full and complete command needed to have a linear regression in R:

Study These Flashcards

(1) data<-read.csv(“excel_sheet1.csv”, header = T, stringsAsFactor = T)

(2) attach(data)

(3) names(data)

(4) m1<-lm(y variable~x variable)
#”m1” is simply your model name

(5) summary.lm(m1)

(6) summary.aov(m1)

(7) plot(m1)

once you see the sky at night and straight line you can assume normal distribution and therefore plot your linear regression

how can you interpret the results of your plot(m1) command in R?

you will be given two micro-graphs: Residuals vs Fitted & Normal Q-Q for Residuals vs Fitted you want to see “sky at night” for Normal Q-Q you want to see a plot where data point follow the line of best fit

how can you create a linear regression model and make a graph with a trend-line in R?

#create a linear regression model using the command: m1<- lm(y-variable~x-variable) #make a graph with a trendline plot(y-variable~x-variable, pch=19, las = 1) #adds straight line abline(lm(y-variable~x-variable))

what can you not add when in a linear regression when your p value is greater than >0.05

you can not add a regression line!

what sort of sample does the data come from in linear regressions?

random samples are used in linear regressions

type of variance in a linear regression:

variance is constant in linear regressions

total sum of squares [SST] =

regression sum of squares [SSR] + error sum of squares [SSE]

null and alternative hypothesis in regards to the slopes in linear regressions:

null hypothesis = the slope is not significantly different from zero alternative hypothesis = the slope is significantly different from zero x

regressions are used when:

- when we are interested in how x and y are related - to analyse experimental data (x manipulated) - when x and y cannot be swapped: assume y depends on x

linear regression null hypothesis:

slope = 0

when is the slope not significantly different from 0?

SSE = SST

Lecture 6 - Statistical Tests II: Linear Regression Flashcards

(34 cards)