Chapter 3: Numerical descriptions of data Flashcards

1
Q

mode

A

-most frequently appearing value, or common frequency class
- “humps in distribution”
- need not be all the same height or count
-mostly to recognize bi-modal or multi-modal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

multi-modal

A

-often a tip-off that different types of individuals in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mean or average

A

-useful when the data is roughly symmetrical and without many outliers
-can be misleading on very skewed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

median

A
  • the halfway point, roughly half are smaller than this value and half are larger
    -the measure of center is more resistant to skewness or outliers
    -frequently used for distributions like income or house cost
    -the value that lies in the middle of the data when the data set is ordered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

median with odd number of entries

A

-median is the middle data entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

median with even number of entries

A

median is the mean of the two middle data entries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

advantage of using the mean

A

-the mean is a reliable measure because it takes into account every entry of a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

disadvantage of using the mean

A

-greatly affected by outliers
-if your data is skewed, then the mean is not the best way to measure the data- median is the best it is not effected by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range

A
  • just maximum and minimum, easier to find by sorting data
    -sensitive to outliers
    -most common
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interquartile Range IQR

A

-range of middle half
-less sensitive to outliers
-the difference between the third and first quartiles
IQR = Q3-Q1
-Q1-1.5IQR (“Low Fence”)
-Q3 + 1.5
IQR (“High
Fence”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

standard deviation

A

-appropriate for symmetric distributions where the mean is a good measure of center
-tells you “on average” how much EACH data value differs (varies) from the mean
-The greater the STANDARD DEVIATION the greater the SPREAD of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

canonical examples of variation

A
  1. the weights of ping pong balls: (b/c of standards and high quality manufacturing, very little spread)
  2. the weights of apples in a grocery store: some but no extreme variability
  3. your little brothers rock collection: a lot of variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard score (z-score)

A

-represents the number of standard deviations a given value x falls from the mean u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

percentile

A

-ranks by 100ths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

decile

A

-ranks by 10ths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

quartile

A

-ranks by 4ths

17
Q

quintiles

A

rank by 5ths

18
Q

percentile

A

the kth percentile is the data value that has k% of the data at or below that value

19
Q

Box and whisker plot

A

-exploratory data analysis tool
-highlight important features of a data set
- requires: minimum entry, first quartile, median, third quartile, maximum entry
-used to gage symmetry and spread of distribution
-multiple plots are convenient to compare two distributions

20
Q

mean = median

A

“normal symmetric distribution”
-no outliers on both ends that out weigh one another

21
Q

mean<median

A

(skewed left- small value outlier)
(think: skewed Left then the mean is Less than median)

22
Q

mean>median

A

(Skewed Right – large value outlier)

23
Q

Measures of variation are

A
  1. range
  2. deviation
  3. variance
  4. standard deviation
24
Q

Measures of Center are

A
  1. mean
  2. median
  3. mode
25
deviation
-is the difference between that particular value and the mean -the sum of deviation will always equal to zero
26
variance
the SUM (∑) of the SQUARED DEVIATIONS divided by the TOTAL FREQUENCY.
27
empirical rule
-can only apply this rule for normal data -(68 – 95 – 99.7 Rule)