distributions Flashcards
1
Q
the binomial distributions
A
- one of the simplest distributions
- an idealised representation of the process that generates sequences of any process that gives rise to binary data
- its an idealisation but natural processes do give rise to binomial distribution
2
Q
normal distribution
A
- The binomial distribution has a shape that is similar to the normal distribution.
- But there are a few key differences:
- The binomial distribution is bounded at 0 and n(number of coins) - the normal distribution can range from + infinity to - infinity
- The binomial distribution is discrete (0,1,2,3 etc, but no 2.5) - normal distribution is continuous.
- The normal distribution is a mathematical abstraction, but we can use it as model of real-life populations that are produced by certain kinds on natural processes.
3
Q
processes that produce normal distributions
A
- to see how a natural process can give rise to a normal distribution
- There’s only 1 rule - you roll the dice n times (number of rounds), add up all the values, and move than many spaces. That is your score.
- We can play any number of rounds
- And we’ll play with friends, because you can’t get a distribution of scores if you play by yourself.
- If we have enough players who play enough rounds then the distribution of scores across all the players will take on a characteristic shape.
- A players score on the dice game is determined by adding up the values of each roll.
- So after each roll their score can increase by some amount.
- The dice game might look artificial, but it maybe isn’t that different to some natural processes.
- For example, developmental processes might look pretty similar to the dice game.
- Think about height:
- At each point in time some value can be added (growth) or a person’s current height.
- So if we looked at the distribution of heights in the population then we might find something that looks similar to a normal distribution.
- A key factor that results in the normal distribution shape is this adding up of values.
4
Q
processes that dont produce normal distributions
A
- Let’s change the rules of the game:
- Instead of adding up the value of each roll, we’ll multiply them (e.g., roll a 1,2, and 4 and your score is 8).
- The distribution is skewed with most player having low scores and a few players have every high scores.
- Can you think of a process that operates like this in the real world?
- How about interest or returns on investments?
- Maybe this explains the shape of real world wealth distributions.
5
Q
describing normal distributions
A
- the normal distribution has a characteristic bell shape but not all normal distributions are identical
- they can vary in terms of where they are centred and how spread out they are
- changing mean and standard deviation changes the absolute position of points on the plot, but not the relative positions measured in units of standard deviation
6
Q
describing deviations from the normal distribution
A
- When looked at the distribution of scores from the second dice game we saw that it was skew.
- Skew is a technical term to describe one way in which distributions can deviate from normal.
- This distribution has a skewness of 0. It is symmetrical.
- Another way to deviate from the normal distribution is to have either fatter or skinnier tails.
- The tailedness of a distribution is given by its kurtosis.
- Kurtosis of a distribution is often specified with reference to the normal distribution. This is excess kurtosis.
This distribution has an excess kurtosis of 0. It is a Mesokurtic distribution.
7
Q
distributions and samples
A
- we’ve seen that whenever we look at the distribution of values where the values are produced by adding up numbers we got something that looked like a normal distribution
- to calculate a sample mean, we just add up a bunch of numbers
- lets say i take lots of samples from a population, and for each sample, i calculate the sample mean
8
Q
the sampling distribution of the mean
A
- Population mean of 100
- And a standard deviation of 1.5
- From this population I can draw samples of 25 values.
- I’ll do this 100,000 times and plot the results in Figure 5.
- The standard deviation of the sampling distribution of the mean has a special name - the standard error of the mean.
9
Q
the central limit theorem
A
- You might think that the sampling distribution of the mean is normally distributed because the population is normally distributed
- But this is not the case, as your sample size increases, then sampling distribution of the mean will be normally distributed.
- And this will happen even if the population is not normally distributed.
- If the sample size is large enough, then the sampling distribution of the mean will approach a normal distribution. This occurs even if the population isn’t normally distributed.
10
Q
the standard error of the mean
A
- In lecture 7, we started talking about the spread of sample means around the population.
- I showed you figure 8, where the average deviation of sample means from the population mean was either small (A) or large (B).
- If the average (squared) deviation in the population is 0 then the average deviation of sample means from the population mean would be 0.
- Because all members of the pop would be the same, so all samples would be the same, so all sample means would be the same.
- The avergae (squared) deviations in the pop was larger, then the average deviations of sample means from the pop mean would be larger.
- If the sample size was large then the average deviation of sample means from the pop mean would be 0.
- Because every sample would be identical to the pop, so every sample mean would be identical to the pop mean.
- If the sample size was smaller, then the average deviations of sample means from the pop mean would be larger.
Lets put these 2 ideas together to try come up with a formula for the average deviations of the sample means from the pop mean. - There’s one final step to get to the formula for the standard error of the mean.
- The formula in Equation 3 is framed in terms of the average (squared deviations) of sample means from the pop mean - that is, in terms of variance.
- But the standard error of the mean is the standard deviation of the sampling distribution.
- The standard deviation is just the square root of the variance, so we just need to take the square root of both sides of equation 3, to get equation 4:
- The formula for the standard error of the mean and where it comes from.
- This was, admittedly, a fairly long winded way to get to what is essentially a very simple formula
- However, as I have alluded to several times, the standard error of the mean is a fairly misunderstood concept
- I hope that getting there the long way has helped you to build a better intuition of what the standard error of the mean actually is
- I dislike talking about misconceptions because I think it can sometimes create them
- But it worth talking about one prominent one
- Misconception
- The SEM tells you how far away the sample mean is (likely) to be from the actual population mean
- But it doesn’t tell you anything about the sample mean… at least not your sample mean that you have calculated for your particular sample
- The standard error of the mean is just what we’re defined it as:
- The standard deviation of the sampling distribution
- So what does this tell you?
- It tells you how far on averages sample means (not your sample mean) will be from the population mean
- Your sample mean might be close to the population mean, it might be far away from the population mean. But the SEM doesn’t quantity this
- Your sample mean is either close or it is far from population mean
- The SEM tells you something about the consequences of a sampling process
- Not something about your sample