Descriptive Statistics Flashcards
(35 cards)
What are the 3 measures of central tendency?
-Mean
-Median
-Mode
What are the 3 measures of disperison?
-Interquaritle Range (IQR)
-Variance
-Standard Deviation
What are the measures of association?
-Chhi-Squared
-Correlation
Define Central tendency
A single number that aims to represent the ‘typical’ value of a variable (the average), somewhere between the highest and lowest value of the observations.
What is Cenral tendency useful for?
-Useful for comparisons between datasets or groups within a dataset
-Can be tracked over time to monitor increases/decreases in key metrics
-Used in many statistical tests
What is the Mean?
Calculated by summing all values of a variable and dividing by the number of observations
Features of the Mean
-For ordinal and scale data
-Statistically powerful (uses all data points)
-Not robust (can be distorted by outliers)
Define the Median (M)
The middle value when values of a variable are arranged in order of magnitude
Features of the Median
-For ordinal and scale data
-Robust to outliers, so more appropriate than mean when dealing with extreme values
-Lacks statistical power
Define the Mode
The most commonly occuring value (may be more than one mode for a single variable)
Features of the Mode
-Only measures suitable for nominal data
-Can be used with ordinal and scale data but other options are generally prefrable
How is the Mode useful?
- Categorical data: The only measure of central tendency suitable for nominal variables
- Visualisation and reporting: Grouping numerical data can simplify communication and involves trade-off between detail and user-friendliness
- Aggregated or transformed data
Define dispersion
Dispersion measures how far, on average, each observation lies from the central tendency. Represents the variation in values within a variable.
Interquartile Range
IQR is the range of values within the middle 50% of data points, calculated as the difference between Q1 and Q3, with Q1 located at position (n+1)/4 and Q3 at 3(n+1)/4
What is comparing the range and IQR useful for?
-Useful for understanding the dispersion of a variable and identifying outliers
-Box plots are the easiest way to do this
Define Variance
The mean of the squared differences between each value and the mean.
Define Standard Deviation
The square root of the variance, representing how far, on average we can expect an individual observation to deviate above or below the mean
How to calculate Standard Deviation
- Find the mean
- Find the difference between value and the mean
- Square each difference
- Find the sum of the squared differences
- Find the variance: the mean of the squared differecnes
- Find the SD: the square root of the variance
What is Kurtosis?
The ‘flatness’ of the distribution of values
What does a Large SD/ flat distribution mean?
Data are fairly dispersed around the mean, with more values in the tails of the distribution
What does a Small SD/ narrow distributiin mean?
Has a ‘peak’ in values clustered around the mean.
What do we use the coefficient of variation (CV) to do?
To measure the relative variability. This is typically expressed as a percentage.
What is the Coefficient of Variation useful for comparing?
-Different variables
-The same variable
-International comparisons
What do measures of association consider?
The relationship between two variables