Statistics Flashcards

1
Q

State three advantages of random sampling

A
  • Avoids suspected sources of bias
  • Only a random sample enables proper statistical inference about the population to be undertaken
  • because the probability basis on which the sample has been selected is known
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If the variance of X is v. What is the variance of X repeated twice and the results are added together?

A

2v
(Var(X1 + X2) = 2Var(X). Not to be confused with Var(2X) = 4Var(X))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Conditions for a Poisson distribution to be appropriate

A

Events occur randomly at a uniform average rate, and independently of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an independent variable?

A

A variable that is not subject to random variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is random sampling needed for proper statistical inference?

A

Because then the probability basis on which the sample has been selected is known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

State the distribution of the score from a fair six-sided dice

A

Uniformly distributed over the values { 1, 2, … , 6 }
include brackets, its a set!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to comment on goodness of fit on a regression line

A
  • comment on r^2 (square r if needed)
  • comment on how close points lie to straight line
    *…so fit is not very/fairly/very good indeed!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditions for a reliable estimate to be made from regression line

A
  • Interpolation
  • and strong linear correlation (seen by points lying close to regression line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Advantages of larger sample sizes for tests using correlation coefficients

A
  1. as sample size increases, random variation in sample tends to decrease
  2. so the (pmcc/spearman’s rank) coefficient tends to get closer to population correlation coefficient
  3. so one can be more confident that the correlation is genuine, rather than simply the result of random variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between association and correlation?

A
  • Association refers to any relationship between two variables
  • Correlation refers to a linear relationship between two variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is it appropriate to use the PMCC?

A
  • Data is random-on-random
  • Parent population follows a bivariate normal distribution (seen by grouping of points on scatter graph having a roughly elliptical shape)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sum of residuals ε1 + ε2 + … =

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

r^2

A
  • The coefficient of determination
  • The proportion of the variation is one variable that can be explained by the variation of another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why might effect sizes be used instead of conducting a hypothesis test?

A
  • For a large set of random on random bivariate data a small non-zero value of the PMCC is likely to lead to a rejection of the null hypothesis of no correlation in the population; the test is uninformative
  • So the size of correlation is considered, rather than whether the population correlation is non-zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a critical value of 5% mean?

A

5% of the time we reject the null hypothesis when it is in fact true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Situations when you might use Spearman’s rank instead of PMCC

A
  • If either variable is non-random
  • If the relationship in non-linear
  • Subjective data
  • Grouping of points on scatter diagram not roughly elliptical
16
Q

Null hypothesis for Spearman’s rank hypothesis test

A

H0: There is no association between x and y (in context) in the population

17
Q

4 conditions for a situation be modelled by a binomial distribution

A
  • Events are independent of each other
  • Events occur randomly at a constant probability
  • Only success/failure possible
  • Fixed number of trials
18
Q

Indicator of whether a Poisson distribution may be able to model a data set is if…

A

sample variance is reasonably close to sample mean

19
Q

If test statistic falls in LHS of the critical region (e.g. chi-squared)…

A
  • Perhaps the model was constructed to fit the data
  • Or some data has been removed in order to produce a better fit
  • Or some of the data is not genuine
20
Q

Four desirable features of a sample

A
  • Random
  • Unbiased
  • Representative of whole population
  • Items are chosen independently
21
Q

Why it would not be sensible to predict the distance for 5 year olds?

A
  • This would be extrapolation
  • As the least age is 50yo (put into context)
  • And the relationship may be different for 5 year olds
22
Q

Why is it sometimes not useful to plot to find the x on y equation?

A

If the values of x are non-random, then it makes no sense to try and predict them

23
Q

Assumption for a chi-squared test

A

Sample must be random

24
Cohen's guideline for effect sizes
ignored 0.0-0.1 small 0.1-0.3 medium 0.3-0.5 large 0.5-1.0
25
Comment on outcome of hypothesis test considering the effect size of 0.165
* The **test** shows that there is almost certainly some real correlation in the **population** * However, the test is uninformative since the effect size is so small
26
Suggest a reason for not using an outlier in any analysis
Because it is not representative
27
Explain why a census would not be used
Because it would be very expensive / impracticable to carry out
28
Explain why they have decided to carry out a test based on PMCC
* Grouping of points in scatter diagram is roughly elliptical * So there is evidence to suggest **bivariate Normality** in the **population** which is required for test using pmcc to be valid
29
Disadvantage of using 10% significance level over 5% significance level
Null hypothesis is more likely to be wrongly rejected
30
When is spearman's rank test not appropriate?
If the scatter diagram shows no evidence of a monotonic relationship
31
Disadvantage of spearman's rank
Ranking data loses information, which might affect the outcome of a test
32
Disadvantage of spearman's rank
Ranking data loses information, which might affect the outcome of a test
33
3 conditions for a situation be modelled by a geometric distribution
* **Events are independent of each other** * **Events occur randomly at a constant probability** * Only success/failure possible