Chapter 2 Flashcards

(25 cards)

1
Q

Measures of central tendency

A

-finding typical value in dataset
-Mean(average)
-Medain (middle value)
-mode: (most frequent value)

Population mean (u)
Sample mean: x bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mean

A

Population mean (u)
Sample mean: x bar

-mean is highly sensitive to outliers (extremely high or low values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Medain

A

Middle value of data set when ranked smallest-largest

-NOT sensitive to outliers, so use when data is extremely skewed

If data set is odd, the median is the value that’s in the middle

If data set is even, the median is the mean of the 2 middle values in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode

A

-most frequently occurring value in data set

-can have multiple modes
-no number repeats = no mode

-most commonly used for categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Right skewed distribution

A

-very large values (outliers) pull average up so tail is longer on right side

mode<medain<mean
-mean will always be greater than medain which will always be greater than mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Left skewed distribution

A

-few extremely low values (outliers) pull average down

mean<medain<mode

-mean will be less than median
which will be less than mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bimodal distribution

A

Two peaks = modes (two values that occur most frequently)
Mean & medain between the peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Uniform distribution

A

mean = median (approximately)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variability

A

Tells us how spread out or clustered values in a dataset are

  1. Range: difference between the largest and smallest values in a dataset

-highly sensitive to outliers
-doesn’t she how data is distributed, two datasets can have same range but look different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variance

A

-Variance: measures how spread out the data points are from the mean, since we square the differences, variance is always positive

-small variance: data points=close to mean (low spread)

-large variance: data points = far from mean (high spread)

-population variance: for entire population

-sample variance: for a sample of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard deviation

A

-square root of variance, tell us how much data typically deviates from mean

small standard deviation: data points close to mean

large standard deviation: data points widely spread out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1-2-3 rule

A

-most observations are within one standard deviation of the mean (if you pick a random value, it’s likely to be pretty close to the mean)

-many values are within two standard deviations (less common, but still happens often)

-Almost all values are within three standard deviations (if a value is beyond this, it’s a possible outlier)

-tell us whether an observation is normal or unusual

Ex: average height of adult is 170 cm with a standard deviation of 10 cm

  1. Typical height within 1 standard deviation of the mean: most people are between 160 cm and 180 cm
  2. Less common heights within 2 standard deviations of the mean: some are between 150 cm and 190 cm
  3. Rare height within 3 standard deviations of the mean: a very small number of people are between 14 cm and 200 cm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard deviation

A

-measure spread around the mean

-tells us how spread out the data is

-values of variance and standard deviation are never negative

-highly sensitive to outliers (variance & standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Z scores

A

-measure individual data points

-measures how far a single is from the mean in terms of standard deviations

-tells us how unusual value is

Z= 0 at the mean

Z=1 one standard deviation away (fairly normal)

Z>2 = unusual

Z>3 very rare

Z = x -mean/standard deviation

(Higher z score = more unusual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Percentiles

A

-tells you where a particular value stands relative to the rest of the data

  • the ath percentile is the number that has a% of the data below it

Ex: 90th percentile (P90) 90% of the data is below this value

25th percentile 25% of the data is below this value

17
Q

Quartiles

A
  • divide data into four parts

Q1= 25% of the data is below this value

Q2 = 50% of the data is below this value (medain)

Q3 = 75% of data is below the value

18
Q

IQR

A

Tells us how spread out the middle 50% of the data is

IQR= Q3 -Q1

Resistant to outliers , focusses on the middle 50% and ignores extreme values

19
Q

How to obtain quartiles

A
  1. Arranged, data, smallest or largest.
  2. Find medrain of data set [Q2)
    -if odd number of observations include medain both halves
    -if even number of observations do not include medain either half
  3. Medain of top half = Q1
    Medain of bottom half = Q3

-skewed data: use IQR for variability
-symmetric/normal data use standard deviation

21
Q

Five numbers summary

A

-MIN
-Q1
-MEDAIN (Q2)
-Q3
-MAX

22
Q

Outliers

A

-values that significantly differ from the rest of the data, either being much higher or much lower

-these value skew the results or make interpretations less accurate

-any data point below the lower fence or above the upper fence is an outlier

23
Q

Adjacent values

A

-most extreme data points that are within the upper and lower limits, but are not considered outliers

-lower adjacent value: smallest number in the data set of lower fence

-Upper adjacent value: largest number in the data set of the upper fence

24
Q

Box plot

A

-graphical representation of the five number summary, helps visualize the distribution central tendency and spread of data

-shows quartiles, potential outliers and medain

25
How to draw a box plot
1. Determine quartiles 2. Determine outliers and adjacent values 3. Draw box from Q1 to Q3, draw middle line at Q2 to show central tendency 4. Draw whiskers from the box to adjacent values (indicates spread) 5. Plot outliers with an Asterix -left skewed: left whisker (lower side) is longer than the (right whisker) upper side -Right skewed: the right whisker (upper side) is longer than the left whisker (lower side)