transformation and comparisons Flashcards

1
Q

the shape of things

A
  • If we measured the height of 1000 women and plotted the values then we might get something like Figure 1.
  • Most heights are in the 155-175 centimetre range.
  • The distribution is roughly symmetrical around its mean (165 cm) and it has a shape characteristic of a normal distribution.
  • Of course the plot in figure 1 doesn’t look exactly like a normal distribution.
  • But if we measured more and more people (e.g., 100,000 people) then we might get something like Figure 2.
  • Figure 2 also shows the corresponding normal distribution with a mean of 165 and a standard deviation of 10.
  • Although the normal distribution is an idealisation, or an abstraction, we can use it to do some very useful things.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

the standard normal distribution

A
  • In lecture 8, I said that two parameters, changed where the normal distribution was centred and how spread out it was.
  • I said that changing these values didn’t change the relative position of points on the plot. The overall shape remains the same.
  • All normal distributions have the same overall shape as the standard normal distribution even if they’re centred in a different place and are more or less spread out.
  • To see what I mean by this, we’ll take out heights of 1000 people, but instead of displaying them in centimetres we’ll display them in metres.
  • Changing the scale on which you’re measured doesn’t actually change your height relative to other people.
  • The distribution in Figure 3a has a standard deviation of 10
  • The distribution in Figure 3b has a standard deviation of 0.1
  • But as you can see, they’re the same distributions - they-re just displayed on different scales (cm vs m)
  • Changing the scale changes the standard deviation. This is why the standard deviation is sometimes referred to as the scale parameter for the distribution.
  • Apart from changing the scale, we can also change where the distribution is centred.
  • In Figure 4a we can see the same distribution as before. In Figure 4b we can see a distribution is now centred at 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

transformations

A
  • in figure 3 and figure 4 we saw that we could transform a variable so that it had a new location (mean) or scale (standard deviation) without changing the shape
  • these two kinds of transformations are known as centring and scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

centring

A
  • to centre a set of measurements, you subtract a fixed value from each observation in the dataset
  • this has the effect if shifting the distribution of the variable along the x-axis
  • you can technically centre a variable by subtracting any value from it but the most frequently used method is mean-centring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

mean centring

A
  • mean centring a variable shifts it so that the new mean is at the zero point
  • the individual values of a mean-centred vairable tell us how far that observation is from the mean of the entire set of measures
  • it doesnt alter the shape of the distribution, or chaange the scale that it’s measured on
  • it only changes the interpretation of the values to, for example, differences from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

scaling

A
  • is performed by dividing each observation by some fixed value
  • this has the effect if stretching or compressing the variable along the x-axis
  • you can scale a variable by dividing it by any value
  • but typically scaling is done by dividing values by the standard deviation of the dataset
  • scaling doesn’t change the fundamental shape of the variables distribution
  • but after scaling the data by the standard deviation the values would now be measured in units of sd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the z transform

A
  • the combination of first mean-centring a variable and then scaling it by its standard deviation is known as the z-transform
  • The 10 values in Table 1 have a mean of 5.7 and a standard deviation of 2.21.
  • To z transform the data in Table 1, we would do the following steps:
    1. We’d subtract 5.7 from each value and put them in the Centred column
    2. Then we’d divide each value in Centred by 2.21
  • We can now interpret the data in terms of distance from the mean in units of standard deviation.
  • The z transform will come in handy when it comes to making comparisons.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

comparing groups

A
  • in the context of quantitative research we’re often looking at the average difference in a variable between groups
  • In the Figure 5 we can see measurements from a reaction time task.
    • Amateurs sportspeople have a mean reaction time of 500ms and professionals have a mean reaction time of 460ms.
    • There is overlap between the two groups, but there is a difference between the averages.
    • To quantify the difference, just subtract the mean of one group from the mean of the other.
      • The mean difference is just 500ms - 460ms = 40ms.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

comparing across groups

A
  • In the previous example the comparisons were easy because the measurements were on the same scale (milliseconds).
  • But let’s say that you want to compare two children on a puzzle completion task.
    • One child is 8 years old, and the other is 14 years old.
    • They do slightly different versions of the task and the tasks are scored differently.
  • Because we have two different tests that might have a different number of items etc we can’t just compare the raw numbers to see which is bigger.
  • Example:
  • Lets take two children:
    • Ahorangi is 8 years old and scored 86 on the task
    • Benjamin is 14 years old and scored 124 on the task
  • We can easily tell that Benjamin’s score is higher than Ahorangi’s score
  • But the scores are not directly comparable… so what do we do?
    • We have to look at how each performed relative to their age groups.
    • Is Ahorangi better performing relative to 8 year olds than Benjamin is relative to 14 year olds?
    • To answer this question we can use the z-transformation.
  • To do the z-transformation we need to know the mean and standard deviation for each age group.
  • That means, that Ahorangi, despite having a lower score, actually scored very high for an 8 year old.
  • Benjamin only scored a little higher than the average 14 year old.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

making comparisons with the sampling distribution

A

· From last week we learned that the sampling distribution of the mean will be centred at the population mean and have a standard deviation equal to the standard error of the mean.
· But remember, we don’t know the value of the population mean we can generate a hypothesis about what we think the population mean might be…
· Although we don’t know the value of the population mean we can generate a hypothesis about what we think the population mean might be…
· We can generate a hypothetical sampling distribution based on our hypothesised value of the population mean.
· Example:
· Let’s say I get a group of people to perform a task where they have to try and quickly recognise two sets of faces. Either famous faces or faces of their family members.
· I find that the mean difference between these two conditions is 24.87ms
· But this is just the difference in my sample. The population mean difference might be some other value
· Although we don’t know the population mean, we could hypothesise that it is 100 ms, 50 ms, 0 ms, or some other value. Let’s just pick 0 ms for now.
· Now we can generate a sampling distribution using our hypothesised population mean and the standard error of the mean we estimate from the sample (let’s say it’s 8.88)
· In Figure 6 we can see what the sampling distribution would look like if the population mean were 0.
* We can compare our particular sample mean of 24.87ms to the sampling distribution
* Because the sampling distribution is a normal distribution we know that ~68% of the time the sample means will fall between ±1 SEM of the population mean (-8.88ms to 8.88ms)
- And ~95% of the time sample means will fall between -17.76ms and 17.76ms.
* For our particular mean we see that it falls 2.8 SEM from our hypothesised population mean
* What can we make of this?
* We can conclude that if the population mean were in fact 0 then we have observed something rare
* If the population mean were in fact 0, then it would be rare for a sample mean to be that far away from the population mean
* Observing something rare doesn’t tell us that our hypothesis is wrong
Rare things happen all the time!
* But if we were to run our experiments again and again, and we continued to observe rare events then we would probably have a good reason to update our hypothesis.
- This process of comparing our sample to the sampling distribution is known as null hypothesis significance testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly