Tells us how spread out or clustered values in a dataset are 1. Range: difference between the largest and smallest values in a dataset -highly sensitive to outliers -doesn’t she how data is distributed, two datasets can have same range but look different

Chapter 2 Flashcards by Hafsa Zaman

Measures of central tendency

-finding typical value in dataset
-Mean(average)
-Medain (middle value)
-mode: (most frequent value)

Population mean (u)
Sample mean: x bar

How well did you know this?

Not at all

Perfectly

Mean

Population mean (u)
Sample mean: x bar

-mean is highly sensitive to outliers (extremely high or low values)

How well did you know this?

Not at all

Perfectly

Medain

Middle value of data set when ranked smallest-largest

-NOT sensitive to outliers, so use when data is extremely skewed

If data set is odd, the median is the value that’s in the middle

If data set is even, the median is the mean of the 2 middle values in the data set

How well did you know this?

Not at all

Perfectly

Mode

-most frequently occurring value in data set

-can have multiple modes
-no number repeats = no mode

-most commonly used for categorical data

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Right skewed distribution

-very large values (outliers) pull average up so tail is longer on right side

mode<medain<mean
-mean will always be greater than medain which will always be greater than mode

How well did you know this?

Not at all

Perfectly

Left skewed distribution

-few extremely low values (outliers) pull average down

mean<medain<mode

-mean will be less than median
which will be less than mode

How well did you know this?

Not at all

Perfectly

Bimodal distribution

Two peaks = modes (two values that occur most frequently)
Mean & medain between the peaks

How well did you know this?

Not at all

Perfectly

Uniform distribution

mean = median (approximately)

How well did you know this?

Not at all

Perfectly

Variability

Tells us how spread out or clustered values in a dataset are

Range: difference between the largest and smallest values in a dataset

-highly sensitive to outliers
-doesn’t she how data is distributed, two datasets can have same range but look different

How well did you know this?

Not at all

Perfectly

Variance

-Variance: measures how spread out the data points are from the mean, since we square the differences, variance is always positive

-small variance: data points=close to mean (low spread)

-large variance: data points = far from mean (high spread)

-population variance: for entire population

-sample variance: for a sample of the population

How well did you know this?

Not at all

Perfectly

Standard deviation

-square root of variance, tell us how much data typically deviates from mean

small standard deviation: data points close to mean

large standard deviation: data points widely spread out

How well did you know this?

Not at all

Perfectly

1-2-3 rule

-most observations are within one standard deviation of the mean (if you pick a random value, it’s likely to be pretty close to the mean)

-many values are within two standard deviations (less common, but still happens often)

-Almost all values are within three standard deviations (if a value is beyond this, it’s a possible outlier)

-tell us whether an observation is normal or unusual

Ex: average height of adult is 170 cm with a standard deviation of 10 cm

Typical height within 1 standard deviation of the mean: most people are between 160 cm and 180 cm
Less common heights within 2 standard deviations of the mean: some are between 150 cm and 190 cm
Rare height within 3 standard deviations of the mean: a very small number of people are between 14 cm and 200 cm

How well did you know this?

Not at all

Perfectly

Standard deviation

-measure spread around the mean

-tells us how spread out the data is

-values of variance and standard deviation are never negative

-highly sensitive to outliers (variance & standard deviation)

How well did you know this?

Not at all

Perfectly

Z scores

-measure individual data points

-measures how far a single is from the mean in terms of standard deviations

-tells us how unusual value is

Z= 0 at the mean

Z=1 one standard deviation away (fairly normal)

Z>2 = unusual

Z>3 very rare

Z = x -mean/standard deviation

(Higher z score = more unusual)

How well did you know this?

Not at all

Perfectly

Percentiles

Study These Flashcards

-tells you where a particular value stands relative to the rest of the data

the ath percentile is the number that has a% of the data below it

Ex: 90th percentile (P90) 90% of the data is below this value

25th percentile 25% of the data is below this value

Quartiles

Study These Flashcards

divide data into four parts

Q1= 25% of the data is below this value

Q2 = 50% of the data is below this value (medain)

Q3 = 75% of data is below the value

IQR

Study These Flashcards

Tells us how spread out the middle 50% of the data is

IQR= Q3 -Q1

Resistant to outliers , focusses on the middle 50% and ignores extreme values

How to obtain quartiles

Study These Flashcards

Arranged, data, smallest or largest.
Find medrain of data set [Q2)
-if odd number of observations include medain both halves
-if even number of observations do not include medain either half
Medain of top half = Q1
Medain of bottom half = Q3

-skewed data: use IQR for variability
-symmetric/normal data use standard deviation

Study These Flashcards

Five numbers summary

Study These Flashcards

-MIN
-Q1
-MEDAIN (Q2)
-Q3
-MAX

Outliers

Study These Flashcards

-values that significantly differ from the rest of the data, either being much higher or much lower

-these value skew the results or make interpretations less accurate

-any data point below the lower fence or above the upper fence is an outlier

Adjacent values

Study These Flashcards

-most extreme data points that are within the upper and lower limits, but are not considered outliers

-lower adjacent value: smallest number in the data set of lower fence

-Upper adjacent value: largest number in the data set of the upper fence

Box plot

Study These Flashcards

-graphical representation of the five number summary, helps visualize the distribution central tendency and spread of data

-shows quartiles, potential outliers and medain

How to draw a box plot

1. Determine quartiles 2. Determine outliers and adjacent values 3. Draw box from Q1 to Q3, draw middle line at Q2 to show central tendency 4. Draw whiskers from the box to adjacent values (indicates spread) 5. Plot outliers with an Asterix -left skewed: left whisker (lower side) is longer than the (right whisker) upper side -Right skewed: the right whisker (upper side) is longer than the left whisker (lower side)

Chapter 2 Flashcards

(25 cards)