Preliminaries and Descriptive Statistics Flashcards
What is a variable?
Something that can vary
What are the three types of variable?
- Categorical: non-numerical
- Continuous: numerical but doesn’t have to be a whole number
- Discrete: numerical and a whole number
What does a population consist of?
people or items that share a particular characteristic (or set of characteristics)
What does a sample refer to?
to a selection of individual people or items from a population
In statistics what are we trying to do in regard to populations and samples?
draw inferences about a population from a sample
What is a population parameter?
a quantity that describes some characteristics of a population with respect to a specific variable (needs to be worked out with the entire population)
What is a sample statistic?
A quantity that describes some characteristic of a sample with respect to a specific variable
Why is it usually hard to calculate population parameters?
We don’t have access to the entire group
Why is it important to summarise data?
It can be very complex and there can be lots of it
What should a measure of central tendency provide?
An indication of a ‘typical’ score in the data set
What are the three measures of central tendency?
- Mean
- Median
- Mode
How do you work out the mean?
Add all scores together and divide by number of scores (N)
What are the pros and cons of using the mean?
- Pro: provides an estimate of the average score of the data-set
- Con: is affected by extreme data points
What is the mean?
The average
What is the median?
Value that lies in the middle of the data
How do you find the median?
- order (rank) the data
- find the score in the middle
- if working with an even set find the average of the two scores in the middle
What are the pros and cons of using the median?
- Pro: Insensitive to extreme scores in the data set
- Con: Doesn’t reflect the shape of the scores
What is the mode?
Indication of the ‘typical’ score in the data set.
How do you find the mode?
Find the most frequently occurring value
What’s the pro and con of using the mode?
- Pro: very easy to calculate from a histogram and easy to understand
- Con: data set might have more than 1 mode or no mode at all
What is the range and what is the problem with it?
- Difference between the maximum and minimum scores in your data
- Range doesn’t always change for distributions with different shapes
What is the deviation?
The (signed) distance of a score from the mean
How do you calculate the average deviation?
- Calculate the mean
- Calculate the deviation of each score from the mean
- Calculate the average deviation (add up all the deviations and divide by the number of deviations)
Why don’t we usually use the average deviation as a measure of spread?
Deviations often cancel each other out