section 2.1: examining numerical data Flashcards

(42 cards)

1
Q

What is a scatterplot used for?

A

Visualizing the relationship between two numerical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a dot plot visualize?

A

One numerical variable. Darker colors represent areas with more observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a stacked dot plot represent?

A

Higher bars indicate areas with more observations, aiding in judging the center and shape of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of a histogram?

A

Provides a view of data density, showing where data is relatively more common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the term ‘center’ refer to in statistics?

A

Mean or average of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for the sample mean?

A

x̄ = (x1 + x2 + x3 + … + xn) / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the population mean computed?

A

Computed the same way as sample mean, usually impossible to calculate due to lack of access to the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does x̄ represent?

A

Sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does μ represent?

A

Population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define unimodal

A

A distribution with a single peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between bimodal and multimodal?

A

Bimodal has two peaks, while multimodal has several prominent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What characterizes a uniform distribution?

A

No apparent peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does ‘right skewed’ refer to?

A

A distribution with a tail extending to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does ‘left skewed’ mean?

A

A distribution with a tail extending to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for variance?

A

s^2 = (sum of(x - x̄)^2)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is standard deviation calculated?

A

s = √(s^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the median in a dataset?

A

The value that splits the data in half when ordered in ascending order

18
Q

What does Q1 represent?

A

25th percentile, also called the first quartile

19
Q

What is the 50th percentile also known as?

20
Q

What does Q3 represent?

A

75th percentile, also called the third quartile

21
Q

Define interquartile range (IQR)

A

The range where the middle 50% of the data lies, calculated as IQR = Q3 - Q1

22
Q

What is the maximum upper whisker reach?

A

Q3 + 1.5 x IQR

23
Q

What is the maximum lower whisker reach?

A

Q1 - 1.5 x IQR

24
Q

Define an outlier

A

An observation beyond the maximum reach of the whiskers

25
What are robust statistics?
Median and IQR
26
What are not robust statistics?
Mean, variance (standard deviation)
27
When describing distributions, what three aspects do we focus on?
Center, shape, and spread of distributions
28
Which plots are used for 2-numerical variable distributions?
Scatter plot
29
Which plots are used for 1-numerical variable distributions?
Dot plot, stacked dot plot, histogram, box plot
30
Why are histograms important?
They are the most important distributions for analysis
31
How can the chosen bin width affect a histogram?
It can alter the story the histogram is telling
32
What does the median represent in relation to data values?
50% of the values are below it and 50% are above
33
What are ways to measure center?
* Histograms * Mean (average) * Median
34
What are ways to measure shape?
* Modality * Skewness
35
What are ways to measure spread?
* Variance (standard deviation) * IQR
36
For skewed distributions, which measures are more helpful?
Median and IQR to describe center and spread
37
For symmetric distributions, which measures are more helpful?
Mean and SD to describe center and spread
38
Which variable is expected to be uniformly distributed: (a) heights of KSU students, (b) salaries of a random sample of people from North Carolina, (c) house prices in America, (d) birthdays of classmates (day of the month)?
(d) Birthdays of classmates (day of the month)
39
Why is it important to look for outliers?
* Identify extreme skew in the distribution * Identify data collection and entry errors * Provide insight into interesting features of the data
40
How would replacing the largest value with $10 million affect the mean, median, standard deviation, and IQR of household income?
* Mean: increase * Median: may not change much * Standard deviation (variance): increase * IQR: stay the same
41
If the smallest value in household income is replaced with $10 million, how does it affect the mean and median?
* Mean: increase * Median: stay the same or not change by much
42
For estimating typical household income for a student, is the mean or median more relevant?
The median, because the distribution is skewed