The coefficient of determination The proportion of the variation is one variable that can be explained by the variation of another

Statistics Flashcards by Tom FORSTER

State three advantages of random sampling

Avoids suspected sources of bias
Only a random sample enables proper statistical inference about the population to be undertaken
because the probability basis on which the sample has been selected is known

How well did you know this?

Not at all

Perfectly

If the variance of X is v. What is the variance of X repeated twice and the results are added together?

2v
(Var(X1 + X2) = 2Var(X). Not to be confused with Var(2X) = 4Var(X))

How well did you know this?

Not at all

Perfectly

Conditions for a Poisson distribution to be appropriate

Events occur randomly at a uniform average rate, and independently of each other

How well did you know this?

Not at all

Perfectly

What is an independent variable?

A variable that is not subject to random variation

How well did you know this?

Not at all

Perfectly

Why is random sampling needed for proper statistical inference?

Because then the probability basis on which the sample has been selected is known

How well did you know this?

Not at all

Perfectly

State the distribution of the score from a fair six-sided dice

Uniformly distributed over the values { 1, 2, … , 6 }
include brackets, its a set!

How well did you know this?

Not at all

Perfectly

How to comment on goodness of fit on a regression line

comment on r^2 (square r if needed)
comment on how close points lie to straight line
*…so fit is not very/fairly/very good indeed!

How well did you know this?

Not at all

Perfectly

Conditions for a reliable estimate to be made from regression line

Interpolation
and strong linear correlation (seen by points lying close to regression line)

How well did you know this?

Not at all

Perfectly

Advantages of larger sample sizes for tests using correlation coefficients

as sample size increases, random variation in sample tends to decrease
so the (pmcc/spearman’s rank) coefficient tends to get closer to population correlation coefficient
so one can be more confident that the correlation is genuine, rather than simply the result of random variation

How well did you know this?

Not at all

Perfectly

What is the difference between association and correlation?

Association refers to any relationship between two variables
Correlation refers to a linear relationship between two variables

How well did you know this?

Not at all

Perfectly

When is it appropriate to use the PMCC?

Data is random-on-random
Parent population follows a bivariate normal distribution (seen by grouping of points on scatter graph having a roughly elliptical shape)

How well did you know this?

Not at all

Perfectly

Sum of residuals ε1 + ε2 + … =

How well did you know this?

Not at all

Perfectly

r^2

The coefficient of determination
The proportion of the variation is one variable that can be explained by the variation of another

How well did you know this?

Not at all

Perfectly

Why might effect sizes be used instead of conducting a hypothesis test?

For a large set of random on random bivariate data a small non-zero value of the PMCC is likely to lead to a rejection of the null hypothesis of no correlation in the population; the test is uninformative
So the size of correlation is considered, rather than whether the population correlation is non-zero

How well did you know this?

Not at all

Perfectly

What does a critical value of 5% mean?

5% of the time we reject the null hypothesis when it is in fact true

How well did you know this?

Not at all

Perfectly

Situations when you might use Spearman’s rank instead of PMCC

If either variable is non-random
If the relationship in non-linear
Subjective data
Grouping of points on scatter diagram not roughly elliptical

Null hypothesis for Spearman’s rank hypothesis test

H0: There is no association between x and y (in context) in the population

4 conditions for a situation be modelled by a binomial distribution

Events are independent of each other
Events occur randomly at a constant probability
Only success/failure possible
Fixed number of trials

Indicator of whether a Poisson distribution may be able to model a data set is if…

sample variance is reasonably close to sample mean

If test statistic falls in LHS of the critical region (e.g. chi-squared)…

Perhaps the model was constructed to fit the data
Or some data has been removed in order to produce a better fit
Or some of the data is not genuine

Four desirable features of a sample

Random
Unbiased
Representative of whole population
Items are chosen independently

Why it would not be sensible to predict the distance for 5 year olds?

This would be extrapolation
As the least age is 50yo (put into context)
And the relationship may be different for 5 year olds

Why is it sometimes not useful to plot to find the x on y equation?

If the values of x are non-random, then it makes no sense to try and predict them

Assumption for a chi-squared test

Sample must be random

Cohen's guideline for effect sizes

ignored 0.0-0.1 small 0.1-0.3 medium 0.3-0.5 large 0.5-1.0

Comment on outcome of hypothesis test considering the effect size of 0.165

* The **test** shows that there is almost certainly some real correlation in the **population** * However, the test is uninformative since the effect size is so small

Suggest a reason for not using an outlier in any analysis

Because it is not representative

Explain why a census would not be used

Because it would be very expensive / impracticable to carry out

Explain why they have decided to carry out a test based on PMCC

* Grouping of points in scatter diagram is roughly elliptical * So there is evidence to suggest **bivariate Normality** in the **population** which is required for test using pmcc to be valid

Disadvantage of using 10% significance level over 5% significance level

Null hypothesis is more likely to be wrongly rejected

When is spearman's rank test not appropriate?

If the scatter diagram shows no evidence of a monotonic relationship

Disadvantage of spearman's rank

Ranking data loses information, which might affect the outcome of a test

Disadvantage of spearman's rank

Ranking data loses information, which might affect the outcome of a test

3 conditions for a situation be modelled by a geometric distribution

* **Events are independent of each other** * **Events occur randomly at a constant probability** * Only success/failure possible