2-Visualizing Numerical Data Flashcards
What factors should be considered when evaluating the relationship of two variables, for example as depicted on a scatterplot?
SSOD
o Shape: linear or curved
o Strength: strong (tight) or weak (scattered)
o Outliers: individuals and groups
o Direction: positive (up/right) or negative (down/right)
What is the mean?
average
What do histograms provide?
o a view of the data density
o values in a range are binned together
o values on the boundary go in the lower bin (which is dumb)
How would you describe a histogram that is left skewed, symmetric, or right skewed?
o left skewed - long tail to left (the mean is what’s skewed left, not the bulk of the data)
o right skewed - long tail to right (the mean is what’s skewed right, not the bulk of the data)
What is a mode on a histogram? What are 4 typical modal distributions?
o a prominent peak in the distribution
o unimodal, bimodal, uniform, and multimodal
What are three frequent measures of center? How do you calculate each?
o mean - arithmetic average; divide the sum by the number of values
o median - midpoint of the distribution (50th percentile); order the values and find the one in the middle; if there is no middle, find the mean of the center-most 2 values
o mode - most frequent observation (not the same as the mode in a histogram)
How does the relative position of the mean and the median differ in left-skewed, symmetric, and right-skewed data?
the mean is always pulled to the side of the skew, even though there are more values on the other side
left-skewed: mean < median
right skewed: mean > median
What does standard deviation describe?
how far away the typical observation is from the mean
What does deviation describe?
the distance of an observation from the mean
What percent of data usually is within 1 sd of the mean? 2 sd?
70%
95%
How are variance and standard deviation often used?
to estimate the uncertainty associated with a sample statistic
How would you describe the prominent features of a box plot? (see boxplot)
see boxplot
lower whisker = Q1 - 1.5 x Q1 Q1 = 25% Line = median Q3 = 75% upper whisker = Q3 +1.5 x Q1
What is an outlier on a boxplot?
an extreme value
identify an observations that appear extreme relative to the rest of the data
helps to identify strong skew, data collection errors, or interesting insights
What is a robust statistic?
the median and the IQR are robust statistics because extreme observations have little affect on their values
(What if we used the median to identify extreme values, removed the extreme values, and then found the mean of the remaining values?)
What is a transformation?
re-scaling of the data using a function