transformation and comparisons Flashcards
the shape of things
- If we measured the height of 1000 women and plotted the values then we might get something like Figure 1.
- Most heights are in the 155-175 centimetre range.
- The distribution is roughly symmetrical around its mean (165 cm) and it has a shape characteristic of a normal distribution.
- Of course the plot in figure 1 doesn’t look exactly like a normal distribution.
- But if we measured more and more people (e.g., 100,000 people) then we might get something like Figure 2.
- Figure 2 also shows the corresponding normal distribution with a mean of 165 and a standard deviation of 10.
- Although the normal distribution is an idealisation, or an abstraction, we can use it to do some very useful things.
the standard normal distribution
- In lecture 8, I said that two parameters, changed where the normal distribution was centred and how spread out it was.
- I said that changing these values didn’t change the relative position of points on the plot. The overall shape remains the same.
- All normal distributions have the same overall shape as the standard normal distribution even if they’re centred in a different place and are more or less spread out.
- To see what I mean by this, we’ll take out heights of 1000 people, but instead of displaying them in centimetres we’ll display them in metres.
- Changing the scale on which you’re measured doesn’t actually change your height relative to other people.
- The distribution in Figure 3a has a standard deviation of 10
- The distribution in Figure 3b has a standard deviation of 0.1
- But as you can see, they’re the same distributions - they-re just displayed on different scales (cm vs m)
- Changing the scale changes the standard deviation. This is why the standard deviation is sometimes referred to as the scale parameter for the distribution.
- Apart from changing the scale, we can also change where the distribution is centred.
- In Figure 4a we can see the same distribution as before. In Figure 4b we can see a distribution is now centred at 0.
transformations
- in figure 3 and figure 4 we saw that we could transform a variable so that it had a new location (mean) or scale (standard deviation) without changing the shape
- these two kinds of transformations are known as centring and scaling
centring
- to centre a set of measurements, you subtract a fixed value from each observation in the dataset
- this has the effect if shifting the distribution of the variable along the x-axis
- you can technically centre a variable by subtracting any value from it but the most frequently used method is mean-centring
mean centring
- mean centring a variable shifts it so that the new mean is at the zero point
- the individual values of a mean-centred vairable tell us how far that observation is from the mean of the entire set of measures
- it doesnt alter the shape of the distribution, or chaange the scale that it’s measured on
- it only changes the interpretation of the values to, for example, differences from the mean
scaling
- is performed by dividing each observation by some fixed value
- this has the effect if stretching or compressing the variable along the x-axis
- you can scale a variable by dividing it by any value
- but typically scaling is done by dividing values by the standard deviation of the dataset
- scaling doesn’t change the fundamental shape of the variables distribution
- but after scaling the data by the standard deviation the values would now be measured in units of sd
the z transform
- the combination of first mean-centring a variable and then scaling it by its standard deviation is known as the z-transform
- The 10 values in Table 1 have a mean of 5.7 and a standard deviation of 2.21.
- To z transform the data in Table 1, we would do the following steps:
- We’d subtract 5.7 from each value and put them in the Centred column
- Then we’d divide each value in Centred by 2.21
- We can now interpret the data in terms of distance from the mean in units of standard deviation.
- The z transform will come in handy when it comes to making comparisons.
comparing groups
- in the context of quantitative research we’re often looking at the average difference in a variable between groups
- In the Figure 5 we can see measurements from a reaction time task.
- Amateurs sportspeople have a mean reaction time of 500ms and professionals have a mean reaction time of 460ms.
- There is overlap between the two groups, but there is a difference between the averages.
- To quantify the difference, just subtract the mean of one group from the mean of the other.
- The mean difference is just 500ms - 460ms = 40ms.
comparing across groups
- In the previous example the comparisons were easy because the measurements were on the same scale (milliseconds).
- But let’s say that you want to compare two children on a puzzle completion task.
- One child is 8 years old, and the other is 14 years old.
- They do slightly different versions of the task and the tasks are scored differently.
- Because we have two different tests that might have a different number of items etc we can’t just compare the raw numbers to see which is bigger.
- Example:
- Lets take two children:
- Ahorangi is 8 years old and scored 86 on the task
- Benjamin is 14 years old and scored 124 on the task
- We can easily tell that Benjamin’s score is higher than Ahorangi’s score
- But the scores are not directly comparable… so what do we do?
- We have to look at how each performed relative to their age groups.
- Is Ahorangi better performing relative to 8 year olds than Benjamin is relative to 14 year olds?
- To answer this question we can use the z-transformation.
- To do the z-transformation we need to know the mean and standard deviation for each age group.
- That means, that Ahorangi, despite having a lower score, actually scored very high for an 8 year old.
- Benjamin only scored a little higher than the average 14 year old.
making comparisons with the sampling distribution
· From last week we learned that the sampling distribution of the mean will be centred at the population mean and have a standard deviation equal to the standard error of the mean.
· But remember, we don’t know the value of the population mean we can generate a hypothesis about what we think the population mean might be…
· Although we don’t know the value of the population mean we can generate a hypothesis about what we think the population mean might be…
· We can generate a hypothetical sampling distribution based on our hypothesised value of the population mean.
· Example:
· Let’s say I get a group of people to perform a task where they have to try and quickly recognise two sets of faces. Either famous faces or faces of their family members.
· I find that the mean difference between these two conditions is 24.87ms
· But this is just the difference in my sample. The population mean difference might be some other value
· Although we don’t know the population mean, we could hypothesise that it is 100 ms, 50 ms, 0 ms, or some other value. Let’s just pick 0 ms for now.
· Now we can generate a sampling distribution using our hypothesised population mean and the standard error of the mean we estimate from the sample (let’s say it’s 8.88)
· In Figure 6 we can see what the sampling distribution would look like if the population mean were 0.
* We can compare our particular sample mean of 24.87ms to the sampling distribution
* Because the sampling distribution is a normal distribution we know that ~68% of the time the sample means will fall between ±1 SEM of the population mean (-8.88ms to 8.88ms)
- And ~95% of the time sample means will fall between -17.76ms and 17.76ms.
* For our particular mean we see that it falls 2.8 SEM from our hypothesised population mean
* What can we make of this?
* We can conclude that if the population mean were in fact 0 then we have observed something rare
* If the population mean were in fact 0, then it would be rare for a sample mean to be that far away from the population mean
* Observing something rare doesn’t tell us that our hypothesis is wrong
Rare things happen all the time!
* But if we were to run our experiments again and again, and we continued to observe rare events then we would probably have a good reason to update our hypothesis.
- This process of comparing our sample to the sampling distribution is known as null hypothesis significance testing