# Data distributions, z scores and p-values Flashcards

1
Q

What are distributions of data and how are data distributions commonly visualised?

A
• distributions of data are the manner in which data for a particular variable is spread over its range
• data distributions are commonly visualised using a histogram
2
Q

What does the shape of the histogram tell us?

A
• are there more common scores in our data set? (mode)
• Is there a range in which data is more concentrated
• Chances of randomly picking someone/something from the sample in a particular range (probability)
3
Q

What shape is a normally distributed histogram and what does this show?

A
• Bell curve
• It has a peak in the middle and trails off either side
• Shows that there’s an average (e.g. height) and then fewer people with lower heights and higher heights
4
Q

When can we find it hard to see normality in a histogram?

A

If we don’t have much data/ small sample size

5
Q

What are examples of Non-normally distributed data?

A
• Positively Skewed data has a tail to the left (reaction time)
• Negatively Skewed data has a tail to the left
• Danger: mean is distorted by tails! (median is a good way to get around that)
6
Q

what are features of bimodal data?

A
• Two distinct populations
• Two peaks
• The mean will be in the middle and not representative of the common score in the data
• Usually bimodal data indicated something has gone wrong with your experiment and you may actually have two populations
7
Q

what is the normal distribution specified by? (equation)

A
• Mean and standard deviation
• N(u,o) commonly used to describe
• u = mean
• o = standard deviation
• N = normality
8
Q

give features of normality distributed histograms

A
• Mean is the line down the centre of the curve
• Standard deviation is related to the width of the curve
• All are bell shaped
• Tails never reach zero (very close though)
• The area under the curve is always equal to 1
• Very close to 0 by the time it gets to 3 standard deviations away from the mean (e.g. mean +/- 3 standard deviations from mean)
9
Q

Why do we need to test probability?

A
• A major goal for us in using statistics is to be able to use our data to test experimental hypotheses
• We can never be certain about the validity of our hypotheses so probabilities will be fundamentally important
• Underlies Null Hypothesis testing
10
Q

What is probability?

A
• We can think of a probability as a measure of how likely it is that an uncertain event will occur
• Probabilities can be expressed as a percentage or proportion
• 0% = impossible
• 100% = certain
11
Q

What is the equation to work out the probability of an event occurring P (event)

A

P(event) = no. of possible outcomes consistent with event/No. of possible outcomes

12
Q

What is conditional probability and give an example

A
• Probability of an event given that something else is known/assumed, i.e. when given/assuming some other additional information
• E.g. I close my eyes and role 1 die. An honest observer tells me I have rolled a number <4. Before I open my eyes what’s the probability that I have rolled an even number?
 No. possible outcomes = 3
 Only 1 is even
 So P(event) = 1/3
 P(even <4) where the line means given
 When you assume something the assumption is the same
13
Q

Give an example of working out area under data distributions for uniform distribution (where outcomes are equally likely)

A
• You have a friend who always arrives to meet you somewhere in the range from 5 minutes before to 5 minutes after the agreed meet time. They are never earlier than 5 mins before and never later than 5 mins after and time they arrive is completely random within that range
• You keep track of arrival times for 100 meetings in a year
• Q. Based on your sample of data what is the probability that your friend is at least two minutes late?
• A. Friend was late <2 minutes 29 times out of 100 meetings so the probability will be 29/100
• Useful to express as a proportion:
• Divide each by 100 – total of all bars is 1
• If all the bars = 1 then if we look at the bars in our range of interest then the bar heights themselves as a proportion of 1 will give us our proportion
• Remember your friend is equally likely to arrive at any time in the range from 5 minutes before to 5 minutes after the agreed meeting time
• So the histogram should look uniform (it was just messed up by a smaller sample size
• Area of distribution tells us about the probability
14
Q

What is a z score (including equation)?

A
• the z score is obtained by subtracting the population mean from x and then dividing by the standard deviation
• X-u shows how far away your score is from the mean
• Diving by u shows what proportion this is of the standard deviation
• E.g. for someone’s IQ 120-100/15 -1.33. This shows us that our score is 1.33 standard deviations higher than the mean
15
Q

Give features of z-transformation in general

A
• will take any normally distributed data and convert to z score
• if you find out z for all scores you will get a normal distribution with mean 0 and standard deviation 1
• N(0,1)
• this means there is one standard normal distribution that all data will follow
• Therefore by matching data for this with values in a chart we can work out the area (which tells us the probability)
16
Q

What are the three columns in the table of standard distribution used to work out the error?

A
• z score
• proportion below score
• proportion above score
• It doesn’t have any negative values but these correlate to the positive values: e.g. the score to the left of -2 would be the same as the score to the right of +2
17
Q

So how do you work out the probability of normally distributed data?

A
• Work out z score
• See where this corresponds on the table