2-Visualizing Numerical Data Flashcards

1
Q

What factors should be considered when evaluating the relationship of two variables, for example as depicted on a scatterplot?

A

SSOD

o Shape: linear or curved

o Strength: strong (tight) or weak (scattered)

o Outliers: individuals and groups

o Direction: positive (up/right) or negative (down/right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mean?

A

average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do histograms provide?

A

o a view of the data density

o values in a range are binned together

o values on the boundary go in the lower bin (which is dumb)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you describe a histogram that is left skewed, symmetric, or right skewed?

A

o left skewed - long tail to left (the mean is what’s skewed left, not the bulk of the data)

o right skewed - long tail to right (the mean is what’s skewed right, not the bulk of the data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a mode on a histogram? What are 4 typical modal distributions?

A

o a prominent peak in the distribution

o unimodal, bimodal, uniform, and multimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are three frequent measures of center? How do you calculate each?

A

o mean - arithmetic average; divide the sum by the number of values

o median - midpoint of the distribution (50th percentile); order the values and find the one in the middle; if there is no middle, find the mean of the center-most 2 values

o mode - most frequent observation (not the same as the mode in a histogram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the relative position of the mean and the median differ in left-skewed, symmetric, and right-skewed data?

A

the mean is always pulled to the side of the skew, even though there are more values on the other side

left-skewed: mean < median

right skewed: mean > median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does standard deviation describe?

A

how far away the typical observation is from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does deviation describe?

A

the distance of an observation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What percent of data usually is within 1 sd of the mean? 2 sd?

A

70%

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are variance and standard deviation often used?

A

to estimate the uncertainty associated with a sample statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you describe the prominent features of a box plot? (see boxplot)

A

see boxplot

lower whisker = Q1 - 1.5 x Q1
Q1 = 25%
Line = median
Q3 = 75%
upper whisker = Q3 +1.5 x Q1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an outlier on a boxplot?

A

an extreme value

identify an observations that appear extreme relative to the rest of the data

helps to identify strong skew, data collection errors, or interesting insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a robust statistic?

A

the median and the IQR are robust statistics because extreme observations have little affect on their values

(What if we used the median to identify extreme values, removed the extreme values, and then found the mean of the remaining values?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a transformation?

A

re-scaling of the data using a function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Identify the side-by-side bar plot, the stacked bar plot, the standardized bar plot, and the mosaic plot (see barplot)

A

see barplot

17
Q

What are two competing hypothesis claims?

A

H0 - independence model, i.e., treatment and outcome are independent, outcomes due to chance

HA - alternative model, i.e., treatment and outcome are not independent, outcomes not due to chance