Module 2 - Section 2 Flashcards

(67 cards)

1
Q

What graphs are best for smaller data sets of numerical variables?

A

Stem plots and dot plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What graphs are best for large data sets of quantitative data?

A

histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

appearance of a dot plot?

A

y-axis: frequency
x-axis: name of variable and the values that the data will fall between
. .
. . . . . . .
. . . . . . . . .
values
-dot above where that data point is
-more dots above a point to indicate a frequency more than one
-(i don’t know look at notes if you are confused)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

stem

A

the leading digits of the number in the data
ex: 75 has leading digit or stem 7
100 could have leading digits 100 or 1 (depending on the data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

leaf

A

the last digit of the number in the data

ex: 75 has leaf 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a key is required for …

A

a stemplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

bins

A

equal-width interval for multiple different numbers of data that are close in values
ex: 70-79 is one bin if 7 is the stem 0-9 are the leaves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

appearance of stemplot

A
stem | leaves
4       |0
5       |
6       |05588
7       |00000455
8       |5
9       |05

Price of Walking shoes
8|5 represents $85

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

back-to-back stem plots

A

-used for the comparison of the distribution of two groups
leaves | stem | leaves
-still require key
-leaves get bigger as you move away from stem! pay attention to left side group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

left inclusion

A

-interval notation as [a,b)
so a on the left is included but not b
-used for histograms along the x-axis to organize bins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

histogram appearance

A
  • bins on x-axis
  • frequency or relative frequency on y-axis
  • bars with no spaces between (unless there is an empty bin)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For dot plots, stem plots, and histograms, which does/does not retain all data values

A

dot and stem plots retain all data values but not histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how can we describe the distribution of a plot?

A

shapes - modes, symmetry or skewness, deviation or outliers
center
spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

mode(s)

A

number of bumps / humps / peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

uniform

A

no modes, square / rectangle appearance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

unimodal

A

a single peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

bimodal

A

two peaks

ex:heights of adults and children will have two peaks one for adults and one for children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

multimodal

A
rarely occurs (except for covid?)
more than two peaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

symmetry

A

when a graph is symmetrical

if you didn’t get this…I am ashamed lol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

non symmetric graphs are

A

skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

skewed to the right

A

positively skewed
peaks quickly and then slowly trickles down to the right
as if the tail end of the peak on the right has been pulled to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

negatively skewed

A

skewed to the left
the left tail is extended and longer than the right tail ( if peak is essentially symmetric)
……^. .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Outlier

A

a deviation that does not follow the overall pattern of the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

numerical summaries

A

a few important and meaningful numbers that preserves the relevant features of the data set so that you can draw useful conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
y
variable of interest | the variable for which we have sample data
26
n
the sample size / number of observations of the variable y
27
y₁
the first sample observation of the variable y
28
yn
the nth sample observation of the variable y
29
center and examples
the value that split the data in half or a typical range of values at the center of the graph median, mean, mode
30
spread and examples
how much do the data values vary around the center? the range of values, concentration, are most values close to or far from the center? range, standard deviation, IQR
31
n Σyᵢ i=1 What is this? describe all elements.
n is the upper boundary i is the lower boundary where the set runs from the ith to the nth piece of data Σ is sigma or summation This describes adding all of the values of y used to find the mean
32
ȳ
``` y bar is the mean mean is n Σyᵢ i=1 -------- n aka the sum of all the values in a data set divided by the number of observations ```
33
M
median the value that divides the ordered sample into two sets for n is odd, it is the middle value for n is even, it is the mean of the two middle values
34
mean vs median
mean is affected by outliers, while the median is resistant to outliers or skewness
35
mode
the value that occurs with the highest frequency in a data set may be more than one mode
36
center values of symmetric, right skewed and left skewed data sets
symmetric: mean=median=mode right skewed: mean>median>mode left skewed: mean
37
range
describes spread the difference between the maximum and minimum values in a data set Range = max - min strongly influenced by outliers
38
larger range means
``` larger variability (usually) however sometimes outliers overestimate this ```
39
deviation
yᵢ - ȳ | The deviation of an observation from the mean
40
positive vs negative deviation
positive means it is above the mean | negative means it is below the mean
41
the set of all deviations
- all add to 0 - describes the variability - can square every deviation before summing them all up to make the deviations more useful as a number for calculations
42
variance
s² = (Σ (yᵢ-ȳ)²) / (n-1) | where Σ has lower boundary i-1 and upper boundary n
43
why is variance problematic?
It is measured in squared units which is not very interpretable on its own
44
standard deviation
s square root of the variance most common measure of variability tells us how closely data is clustered around the mean measured in the same units as the original data
45
when would s=0
when all observations have the same value
46
what happens if s > 0
the standard deviation s increases as observations become more spread out / has greater variability
47
when can/should we use standard deviation? why?
we should only use standard deviation and mean together neither of them are resistant to outliers, thus neither should be used if outliers are present and affecting them to be inaccurate
48
IQR
interquartile range measure of variability resistant to outliers, ∴ goes with median divides the data into 4 equal sections ( quartiles
49
percentile
the pth percentile is the value so that p% of the measurements fall below the pth percentile and (100-p)% are above it
50
what is the median in percentile?
50%
51
can 215 be p?
no, percentiles are always between 0-100
52
Q₁
the lower quartile is the 25th percentile (separates 25% and 75% of measurements) median between measurements that fall below the overall median
53
Q₃
the upper quartile is the 75th percentile ( separates the top 25% from the bottom 75%) median between measurements that fall above the overall median
54
what is between Q₁ and Q₃?
the middle 50% of measurements that fall between Q₁ and Q₃
55
IQR calculation
Q₃-Q₁
56
if IQR is small
data is clustered around the center
57
if IQR is large
data is scattered far from the center
58
how do we choose a numerical summary?
1. draw a graph 2. use mean and standard deviation for reasonably symmetric data 3. use median and IQR for skewed data 4. If there are multiple modes try to understand why and consider splitting data into two groups 5. If using mean and standard deviation with outliers, report them with outliers present and removed
59
five-number-summary
minimum, Q₁, median, Q₃, maximum
60
boxplot
visual representation of data using the 5 number summary shows the center, spread, symmetry/skewness at the same time useful for comparing groups
61
fences
upper fence = Q₃ + (1.5 x IQR) lower fence = Q₁ - (1.5 x IQR) measurements outside the fences are considered outliers
62
whiskers
line drawn at the end of the box plot where the highest or lowest value is that is within the fences (not an outlier)
63
far outliers
outliers that are farther than 3 IQRs from the quartiles
64
appearance of boxplots
x___|-------|̲̅ ̅ ̲̲̲̅̅ ̲̅ ̲̅ ̲̅ ̲̅|̲̅ ̲̅ ̲̲̅̅ ̲̅|--------| | symbols for outliers, whiskers, box for the IQR and a line in the box for the median
65
box plots that are skewed
symmetrical skewed right: median to the left of center and a long right whisker skewed left: median to the right of center and a long left whisker
66
comparative box plots
draw two box plots in one graph to compare the data in two different categories
67
time plot
used when interested in how the data behaves over time