Chapter 3 - Descriptive Statistics: Numerical Measures p.102 Flashcards

1
Q

Boxplot

A

A graphical summary of data based on a five-number summary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chebyshev’s theorem p.127

A

A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Coefficient of variation p.121

A

A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation coefficient p.141

A

A measure of linear association between two variables that take on values between -1 and +1. Values near +1 indicate a strong positive linear relationship; values near -1 indicate a strong negative linear relationship; and values near zero indicate the lack of a linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Covariance p.138

A

A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Empirical rule p.128

A

A rule that can be used to compute the percentage of data values that must be within one, two, and three deviations of the mean for data that exhibits a bell-shaped distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Five-number summary p.133

A

A technique that uses five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Geometric mean p.109

A

A measure of location that is calculated by finding the nth root of the product of n values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interquartile range (IQR) p.119

A

A measure of variability, defined to be the difference between the third and first quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mean p.104

A

A measure of central location computed by summing the data values and dividing by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Median p.107

A

A measure of central location provided by the value in the middle when the data are arranged in ascending order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mode p.110

A

A measure of location, defined as the value that occurs with greatest frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outlier p.130

A

An unusually small or unusually large data value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Percentile p.111

A

A value such that at least p percent of the observations are greater than or equal to this value and at least (100 - p) percent of the observations are less than or equal to this value. The 50th percentile is the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Point estimator p.104

A

A sample statistic used to estimate the corresponding population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Population parameter p.104

A

A numerical value used as a summary measure for a population (e.g., the population mean, the population variance, and the population standard deviation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Quartiles p.112

A

The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Range p.118

A

A measure of variability, defined to be the largest value minus the smallest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sample statistics p.104

A

A numerical value used as a summary measure for a sample )e.g. the sample mean, the sample variance, and the sample standard deviation)

20
Q

Skewness p.125

A

A measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right results in positive skewness.

21
Q

Standard deviation p.120

A

A measure of variability computed by taking the positive square root of the variance.

22
Q

Variance p.119

A

A measure of variability based on the squared deviations of the data values about the mean.

23
Q

Weighted mean

A

The mean obtained by assigning each observation a weight that reflects its importance.

24
Q

z-score p.126

A

A value computed by dividing the deviation about the mean (x - Mean(x)) by the standard deviation. A z-score is referred to as a standardized value and denotes the number of standard deviations x is from the mean.

The process of converting a value for a variable to a z-score is often referred to as a z-transformation.

25
Q

3.1 Sample Mean

A

Mean(x) = Sum(x)/n

26
Q

3.2 Population Mean

A

Mean(x) = Sum(x)/N

27
Q

3.3 Weighted Mean

A

Mean(x) = Sum(w * x)/Sum(w)

Where
W = weight for x

28
Q

.4 Geometric Mean

A

Mean(x) = nthRoot(Product(x)) = [Product(x)]^(1/n)

29
Q

3.5 Location of the pth Percentile

A

Location(p) = (p/100)*(n + 1)

30
Q

3.6 Interquartile Range

A

IQR = Quartile(3) - Quartile(1)

The measure of variability is the difference between the third quartile, Quartile(3), and the first quartile, Quartile(1).

31
Q

3.7 Population Variance

A

Variance(x) = Sum((x - mean)^2)/N

32
Q

3.8 Sample Variance

A

Variance(x) = Sum((X - mean)^2)/(n - 1)

33
Q

3.9 Sample Standard Deviation

A

StDev(x) = s = SquareRoot(Variance(x)) = SquareRoot(Square(s))

34
Q

3.10 Population standard deviation

A

StDev(x) = s = SquareRoot(Variance(x)) = SquareRoot(Square(s))

35
Q

3.11 Coefficient of Variation

A

A descriptive statistic that indicates ow large the standard deviation is relative to the mean.

(StDev(x)/Mean(x))%

36
Q

3.12 z-Score

A

z = (x - Mean(x))/StDev(x)

Where z is the z-score for value x

37
Q

3.13 Sample Covariance

A

Covariance(x,y) = Sum((x - Mean(x)) * (y - Mean(y)) / (n - 1)

38
Q

3.14 Population Covariance

A

Covariance(x,y) = Sum((x - Mean(x)) * (y - Mean(y)) / N

39
Q

3.15 Pearson Product Moment Correlation Coefficient: Sample Data

A

CorrelationCoefficient(x,y) = Covariance(x,y)/(StandardDeviation(x) * StandardDeviation(y))

40
Q

3.16 Pearson Product Moment Correlation Coefficient: Population data

A

CorrelationCoefficient(x,y) = Covariance(x,y)/(StandardDeviation(x) * StandardDeviation(y))

41
Q

Median

A

Arrange the data in ascending order (smallest value to largest value).

    a. For an odd number of observations, the median is the middle value.
    b. For an even number of observations, the median is the average of the two middle values.
42
Q

Mode

A

The mode is the value that occurs with greatest frequency.

43
Q

Range

A

Range = Largest value - Smallest value

44
Q

Chebyshev’s Theorem

A

At least (1 - 1/z^2) of the data values must be within z standard deviations of the mean, where z is any value greater than 1.

   - Some implications of this theorem, with z = 2,3, and 4 standard deviations, follow:
           - At least .75 or 75% of the data values must be within z = 2 standard deviations of the mean
           - At least .89 or 89% of the data values must be within z = 3 standard deviations of the mean
           - At least .94 or 94% of the data values must be within z = 4 standard deviations of the mean

Chebyshev’s theorem requires z > 1; but z need not be an integer.

45
Q

Outliers

A
Standardized values (z-score) can be used to identify outliers.
               - Any data value with a z-score less than -3 or greater than +3 is identified as an outlier.

IQR method to identify outliers

    - First compute the following lower and upper limits:
           - Lower Limit = Quartile(1) - 1.5(IQR)
           - Upper Limit = Quartile(3) + 1.5(IQR)
   - A Observation is classified as an outlier if its value is less than the lower limit or greater than the upper limit.

The approach that uses the first and third quartiles and the IQR to identify outliers does not necessarily provide the same results as the approach based upon a z-score less than -3 or greater than +3. Either or both procedures may be used.

46
Q

Five-number summary

A
  1. Smallest value
  2. First quartile, Quartile(1)
  3. Median, Quartile(2)
  4. Third quartile, Quartile(3)
  5. Largest value