section 2.2: considering categorical data Flashcards

(35 cards)

1
Q

what is a contingency table?

A

a table that summarizes data for two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a bar plot?

A

common way to display a single categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a relative-frequency bar plot?

A

a bar plot where proportions instead of frequencies are shown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how are bar plots different than histograms?

A

Bar plots are used for displaying distributions of categorical variables, while
histograms are used for numerical variables. The x-axis in a histogram is a
number line, hence the order of the bars cannot be changed, while in a bar plot
the categories can be listed in any order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is variance?

A

the standard deviation squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the equation for variance?

A

s^2 = (sum of(x - x̄)^2)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what points make a larger difference in variance?

A

points that are far away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we use the squared deviation in the calculation of variance?

A

To get rid of negatives so that observations equally distant from the mean are weighed equally.
To weigh larger deviations more heavily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is standard deviation?

A

the square root of the variance, and has the
same units as the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the median?

A

the value that splits the data in half when ordered in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the 50th percentile?

A

the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the 25th percentile?

A

the first quartile, Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the 75th percentile?

A

the third quartile, Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is interquartile range (IQR)?

A

where the middle 50% of the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the equation for IQR?

A

IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does the box in a box plot represent?

A

represents the middle 50% of the data, and
the thick line in the box is the median

17
Q

what is the max upper whisker reach of a box plot?

A

Q3 + 1.5 x IQR

18
Q

what is the max lower whisker reach of a box plot?

A

Q1 - 1.5 x IQR

19
Q

what is an outlier?

A

observation beyond the maximum reach of the whiskers

20
Q

why is it important to look for outliers?

A

Identify extreme skew in the distribution.

Identify data collection and entry errors.

Provide insight into interesting features of the data.

21
Q

what are the robust statistics?

A

median and IQR

22
Q

what are the non-robust statistics?

A

mean, variance (standard deviation)

23
Q

for skewed distributions it is often more helpful to use ___________ to describe the center and spread

A

median and IQR

24
Q

for symmetric distributions it is often more helpful to use __________ to describe the center and spread

A

the mean and SD

25
if a distribution is symmetric, the center is defined as _______
the mean mean ~ median
26
if a distribution is skewed or has extreme outliers, the center is defined as _______
the median
27
if a distribution is right-skewed, the mean is
greater than the median
28
if a distribution is left-skewed, the mean is
less than the median
29
what is a side by side bar plot?
Displays the same information by placing bars next to, instead of on top of, each other
30
what is a standardized stacked bar plot?
a stacked bar plot where the variables are measured as a proportion compared to the whole
31
what is a mosaic plot?
visualization technique suitable for contingency tables that resembles a standardized stacked bar plot with the benefit that we still see the relative group sizes of the primary variable as well.
32
what are the ways to measure center?
histograms, mean (average), median
33
what are the ways to measure shape?
modality, skewness
34
what are the ways to measure spread?
variance (standard deviation), IQR
35
If you would like to estimate the typical household income for a student, would you be more interested in the mean or median income?
the median, because the distribution is skewed