Lecture 2 Flashcards
(22 cards)
4 aspects needed to correctly describe a single quantitative variable
Centre, spread (different ways of measuring spread), Skew (+,-,symmetric), weird things (outliers, multiple modes)
Centre
Where most data located
Spread (2)
Over what range we see most of the data/how much alike/different the observations are
Skew
What direction does the spread extend to
Weird things
Some points really far away (outliers) or 2 centres
Dotplot advantages and disadvantages
Get to see all data points, Easy to interpret but gets messy quickly if there are a lot of data
Histogram advantages and disadvantages
Easy to pick up (have an idea of ) on centre, spread, skew, multiple modes and even outliers + Made by most statistical packages (programs) but Different bin widths could give different interpretations of the data
Mode
Number that occurs most often
We say that the median is _____________ to outliers
robust (an outlier won’t change it)
Mean advantages and disadvantages
Good for estimating population means and good inferential properties but affected by outliers and skewed data
Median advantages and disadvantages
Easy to interpret, Not influenced by outliers but bad inferential properties and longer to calculate
Mode advantages and disadvantages
Highest concentration of data and we can see bimodal data but class definition matters (precision des intervalles, peu precis peut avoir plusieurs modes et trop precis fait aucun mode)
Population and sample mean notation
Population mean : µ
Sample mean : x barre ou X barre
Determine skew w/ median and mean
Mean left to median = left skew, mean right to median = right skew, mean = median : symmetric
Range def and pros and cons
Difference between max and min values (it’s a measure of spread). Easy to compute but sensitive to outliers
Sample variance measurement (S exposant 2)
(formule)
Population variance (sigma exposant 2)
(formule)
Why squared deviations (2)
The sum of Xi - X barre for all values of Xi is 0. Also, absolute values are not good for inference so we use squared deviations
To remember in squared deviations
Variance is measured in squared units
Sample standard deviation (S)
Formule . (Racine carrée rajoutée par dessus toute la formule du sample variance)
Units of S
Same units as data themselves
T of F : The standard deviation is the average absolute deviation from the mean
False but it doesn’t hurt much to think of it as the average distance of observations from the mean