Week 12 - Descriptive statistics Flashcards

(55 cards)

1
Q

What are numerical measures of descriptive statistics?

A

measures of central tendency (location) and measures of dispersion (variability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are sample statistics?

A

If the measures are computed for data from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are population parameters?

A

If the measures are computed for data from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a sample statistic referred to?

A

as the point estimator of the corresponding population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 7 measures of location?

A
  1. Mean
  2. Median
  3. Mode
  4. Weighted Mean
  5. Geometric Mean
  6. Percentiles
  7. Quartiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mean of a data set?

A

the average of all the data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the sample mean?

A

The sample mean x̄ is a point estimate of the population mean m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mean equation?

A

x̄ = ∑x_i/ n

numerator - sum of the values of the n observations
denominator - number of observations in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the median of a data set?

A

is the value in the middle when the data items are arranged in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is the mean the preferred measure of central location?

A

Whenever a data set has extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is the median most often reported for out of the measure of location?

A

annual income and property value data
A few extremely large incomes or property values can inflate the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we calculate the mean for an odd number of observations?

A

Say we have the following 7 observations:
Sort them in ascending order:
Median is the middle value: 19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we calculate the mean for an even number of observations?

A

Even number of observations:
Say we have 8 observations:
Sort them in ascending order:

Median is the average of the middle two values: (19 + 26)/2 = 22.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where are the mean and median on a symmetrical diagram?

A

equal at the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where are the mean and median on a left skew diagram?

A

mode is at the top, going down the tail is median then mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where are the mean and median on a right skew diagram?

A

mode is at the top, going down the tail is median then mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the mode?

A

The mode of a data set is the value that occurs with greatest frequency.
The greatest frequency can occur at two or more different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is bimodal data?

A

If the data have exactly two modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is multimodal data?

A

If the data have more than two modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is tthe weighted mean?

A

When the mean is computed by giving each data value a weight that reflects its importance

When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the weighted mean equation?

A

𝑥̅= (∑ 𝑤_𝑖 x 𝑥_𝑖)/ (∑𝑤_𝑖 )

x_i = value of observation i
w_i = weight for observation i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is value weighted?

A

a type of weighted mean where the weights are based on the values themselves rather than being assigned separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is equal weighted return?

A

imple average of all returns, giving each asset or component the same importance, regardless of size or value. This is in contrast to a value-weighted return, where larger values (e.g., market capitalization) carry more weight

24
Q

What is value weighted return equation?

A

value_x x r_x + value_y x r_y / value_x + value_y

25
What is equal weighted return equation?
=∑X_i / n X_i = individual returns n = number of assets or components
26
What is a portfolio return?
A portfolio return is the weighted average return of individual assets in the portfolio usually equal the value weighted return
27
When is the geometric mean most appropriate to use?
most appropriate in situations where the data items to be summarised result from a ratio-type calculation, such as with growth rates or index numbers calculated by multiplying all the numbers together and then taking the nth root of the product, where n is the total number of values
28
What is a percentile?
provides information about how the data are spread over the interval from the smallest value to the largest value Admission test scores for colleges and universities are frequently reported in terms of percentiles
29
What is the pˆth percentile of a data set?
a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more. 10th percentile of a data set is a value such that at least 10% of the items are less than or equal to 90% of the items
30
How to calculate a percentile?
Arrange the Data: Sort the data set in ascending order. Determine the Position (i): Calculate the position using the formula: ​𝑖 = (p/100) x n where p is the desired percentile and n the number of observations Locate the Percentile: If 𝑖 is an integer, the p-th percentile is the average of the values at positions 𝑖 and 𝑖 +1 If 𝑖 is not an integer, round up to the next whole number, and the p-th percentile is the value at this position.
31
Example of percentile calculation
Consider a data set: 7, 10, 15, 20, 25. To find the 40th percentile: Arrange the Data: The data is already in ascending order. Determine the Position (i): p=40 n=5 𝑖 = (40/100)×5 = 2 Locate the Percentile: Since 𝑖=2 is an integer, the 40th percentile is the average of the values at positions 2 and 3. Values at positions 2 and 3 are 10 and 15, respectively. 40th percentile = (10 + 15)/2=12.5 Therefore, the 40th percentile of this data set is 12.5.
32
What are quartiles?
specific percentiles first quartile = 25th percentile second quartile = 50th percentile = median third quartile = 75th percentile
33
What does measures of variability (dispersion) help up to understand?
how data points spread out from the centre (mean or median). This is useful in decision-making, such as evaluating supplier delivery times, stock price volatility, or quality control in manufacturing.
34
What are the 5 main measures of variability (dispersion)?
1. Range 2. Interquartile Range (IQR) 3. Variance 4. Standard Deviation 5. Coefficient of Variation (CV%)
35
What is the range?
The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.
36
How to calculate the range?
Range = largest value - smallest value
37
What is the interquartile range?
The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values.
38
How to calculate the interquartile range?
IQR = 3rd quartile - 1st quartile
39
How is a box plot drawn?
with its ends located at the 1st and 3rd quartiles a vertical line is drawn in the box at the location of the median (second quartile) Dashed lines are drawn from the ends of the box to the smallest and largest data values inside the limits. Data outside these limits are considered outliers The locations of each outlier is shown with the symbol * .
40
How to calculate the lower limit and upper limit for a box plot for outliers?
the lower limit is located 1.5(IQR) below Q1 the upper limit is located 1.5(IQR) above Q3
41
What is the variance?
The variance is the average of the squared differences between each data value and the mean. The variance is a measure of variability that utilises all the data. It is based on the difference between the value of each observation (xi) and the mean (𝑥 ̅ for a sample, µ for a population).
42
What is the variance equation?
sˆ2 = [ ∑(x_i - x̄)ˆ2]/ (n-1) for a sample x_i - each individual data point x̄ - sample mean n - sample size σˆ2 = [ ∑(𝑥_𝑖 −µ)ˆ2]/ N for a population x_i - each individual data point 𝜇 - population mean 𝑁 - total number of data points in the population
43
What is the standard deviation?
set is the positive square root of the variance. It is measured in the same units as the data, making it more easily interpreted than the variance.
44
How to calculate standard deviation?
s = √sˆ2 = √[ ∑(x_i - x̄)ˆ2]/ (n-1) for a sample x_i - each individual data point x̄ - sample mean n - sample size σ = √σˆ2 = √[ ∑(𝑥_𝑖 −µ)ˆ2]/ N for a population x_i - each individual data point 𝜇 - population mean 𝑁 - total number of data points in the population
45
What is the coefficient of variation?
how large the standard deviation is in relation to the mean
46
How do you calculate the coefficient of variation?
CV = (s/x̄) x 100% for a sample s - sample standard x̄ - sample mean CV = (σ/𝜇) x 100% for a population σ = population standard deviation 𝜇 = population mean
47
Show an example of variance, standard deviation and coefficient of variation linked together
Variance: 𝑠^2= (∑(𝑥_𝑖 − x̄)ˆ2 )/ (𝑛−1) = 2,996.16 Standard Deviation: 𝑠= √(𝑠ˆ2 )= √2996.16 = 54.74 Coefficient of variation: (s/x̄) x 100% =(54.74/490.84) x 100% = 11.15% the standard deviation is about 11% of the mean
48
What are the 2 measures of association between 2 variables?
1. covariance 2. correlation coefficient
49
What is the covariance a measure of?
a measure of the linear association between two variables. Positive values indicate a positive relationship. Negative values indicate a negative relationship.
50
How do you calculate the covariance?
𝑠_XY= [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)]/ (𝑛−1) for samples ​x_i, y_i - individual data points for variables x̄, ȳ - means of variables X and Y n - sample size σ_XY = [ ∑(𝑥_𝑖 − µ_𝑌)(y_i - µ_𝑌)]/ 𝑛 for populations µ_x, µ_y - populations means of X and Y n - population size
51
What is the correlation coefficient?
quantifies the strength and direction of the linear relationship between two variables (not necessarily causation, just because two variables are highly correlated, it does not mean that one variable is the cause of the other) The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship
52
How to calculate correlation coefficient?
r_XY = S_XY / (S_X)(S_Y) = [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)] / √(∑(x_i - x̄)ˆ2)(∑(y_i - ȳ)ˆ2) for samples x_i​, y_i - individual data points for variables x̄, ȳ - means of variables X and Y n - number of data points p_XY = σ_XY/ (σ_X)(σ_Y) = [ ∑(𝑥_𝑖 − μ_x)(y_i - μ_y)] / √(∑(x_i - μ_x)ˆ2)(∑(y_i - μ_y)ˆ2) for populations x_i, y_i​ - individual data points for variables X and Y μ_x, μ_y - population means for X and Y n - population size (number of data points)
53
What are the different correlation coefficients?
Positive Correlation: If r>0, as one variable increases, the other tends to increase. Negative Correlation: If r<0, as one variable increases, the other tends to decrease. No Correlation: If r=0, there is no linear relationship between the two variables. Strength: Strong: r near 1 or -1 Weak: r near 0
54
What makes correlation coefficients perfect?
Perfect Positive Correlation (r=1): A straight line with a positive slope (both variables increase together in perfect proportion). Perfect Negative Correlation (r=−1): A straight line with a negative slope (one variable increases as the other decreases in perfect proportion). No Correlation (r=0): No linear pattern in the data.
55
Example of covariance and correlation coefficient calculation linked together
Sample covariance: 𝑠_XY= [ ∑(𝑥_𝑖 − x̄)(y_i - ȳ)]/ (𝑛−1) = -35.4/ 6-1 = -7.08 Sample correlation coefficient: r_XY = S_XY / (S_X)(S_Y) = -7.08/ (8.2192)(0.8944) = -0.9631