Descriptive Statistics Flashcards
(38 cards)
What are descriptive statistics?
Describe the data you have
What is the population?
Entire group of people you are interested in
What is a sample?
Subset of population
Usually represented with n (also known as sample size)
What is categorical data?
Usually nominal or ordinal
Two or more categories with no ordering to them
What are examples of categorical data?
Hair colour
Marital status
What is discrete data?
Usually ordinal, ratio or interval variables
Fixed value with logical order
What are examples of discrete data?
Shoe size
Score out of 10
What is continuous data?
Usually ratio or interval variables
Can take any fractional value
What are examples of continuous data?
Reaction times
How can categorical data be presented in a frequency distribution?
As its raw frequency or as a percentage frequency
How can discrete data be presented in a frequency distribution?
As raw frequency or percentage
As cumulative frequency or percentage
If loads of values, use frequency ranges instead (grouped in meaningful way)
What are measures of central tendency?
Sometimes want to condense entire frequency distribution into single number
Where might want to calculate tendency of data
What are three types of measures of central tendency?
Mode
Median
Mean
What is the mode?
Score occurring most often in dataset
Sometimes takes more than one value (bimodal and multimodal distributions)
What data is the mode used for?
Nominal data
What is the median?
Middle score in dataset
Middle value in dataset or mean of middle two values
How do you work out the median for odd value datasets?
(n+1) / 2
How do you work out the median for even value datasets?
(middle two values) / 2
What are the pros of the median?
Insensitive to outliers
Often gives real, meaningful data value
What data is the median used for?
Ordinal data
Skewed interval/ratio data
What are the cons of the median?
Ignores a lot of data
Difficult to calculate without a computer
Can’t use with nominal data
What is the mean?
Sum of data points divided by number of data points
What are the pros of the mean?
Uses all of the data
Most effective for normally distribution datasets
What are the cons of the mean?
Sensitive to outliers
Values not always meaningful
Only meaningful for ratio and interval data