Descriptive Statistics Flashcards

(35 cards)

1
Q

What are the 3 measures of central tendency?

A

-Mean
-Median
-Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 measures of disperison?

A

-Interquaritle Range (IQR)
-Variance
-Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the measures of association?

A

-Chhi-Squared
-Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Central tendency

A

A single number that aims to represent the ‘typical’ value of a variable (the average), somewhere between the highest and lowest value of the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Cenral tendency useful for?

A

-Useful for comparisons between datasets or groups within a dataset
-Can be tracked over time to monitor increases/decreases in key metrics
-Used in many statistical tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Mean?

A

Calculated by summing all values of a variable and dividing by the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Features of the Mean

A

-For ordinal and scale data
-Statistically powerful (uses all data points)
-Not robust (can be distorted by outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the Median (M)

A

The middle value when values of a variable are arranged in order of magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Features of the Median

A

-For ordinal and scale data
-Robust to outliers, so more appropriate than mean when dealing with extreme values
-Lacks statistical power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the Mode

A

The most commonly occuring value (may be more than one mode for a single variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Features of the Mode

A

-Only measures suitable for nominal data
-Can be used with ordinal and scale data but other options are generally prefrable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the Mode useful?

A
  • Categorical data: The only measure of central tendency suitable for nominal variables
  • Visualisation and reporting: Grouping numerical data can simplify communication and involves trade-off between detail and user-friendliness
  • Aggregated or transformed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define dispersion

A

Dispersion measures how far, on average, each observation lies from the central tendency. Represents the variation in values within a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interquartile Range

A

IQR is the range of values within the middle 50% of data points, calculated as the difference between Q1 and Q3, with Q1 located at position (n+1)/4 and Q3 at 3(n+1)/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is comparing the range and IQR useful for?

A

-Useful for understanding the dispersion of a variable and identifying outliers
-Box plots are the easiest way to do this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Variance

A

The mean of the squared differences between each value and the mean.

17
Q

Define Standard Deviation

A

The square root of the variance, representing how far, on average we can expect an individual observation to deviate above or below the mean

18
Q

How to calculate Standard Deviation

A
  1. Find the mean
  2. Find the difference between value and the mean
  3. Square each difference
  4. Find the sum of the squared differences
  5. Find the variance: the mean of the squared differecnes
  6. Find the SD: the square root of the variance
19
Q

What is Kurtosis?

A

The ‘flatness’ of the distribution of values

20
Q

What does a Large SD/ flat distribution mean?

A

Data are fairly dispersed around the mean, with more values in the tails of the distribution

21
Q

What does a Small SD/ narrow distributiin mean?

A

Has a ‘peak’ in values clustered around the mean.

22
Q

What do we use the coefficient of variation (CV) to do?

A

To measure the relative variability. This is typically expressed as a percentage.

23
Q

What is the Coefficient of Variation useful for comparing?

A

-Different variables
-The same variable
-International comparisons

24
Q

What do measures of association consider?

A

The relationship between two variables

25
Chi-Squared
Tests for association based on the frequency of two variables' co-occurence, comparing the expected frequency if there was no association with the observed frequency in the sample data.
26
Correlation
Represents the strength and magnitude of the association between two variables. The correlation coefficient r ranges from -1 to 1.
27
What is a Contingency table?
Contingency tables list the possible values of x and y, and the frequency of each combination. This is known as cross-tabulation.
28
What does a Positive covariance indicate?
Indicates variables that tend to 'move together' away from their means: if we observe a high value of x, we also expect to see a high value of y
29
What does a Negative Covariance indicate?
Indicates variables that move in opposite directions: if we oberve a high value of x, we expect to see a value of y below its mean.
30
What is the correlation coefficient?
It transforms the covariance to a scaled, interpretable representation of the strength and direction of this relationship. It ranges from -1 to 1.
31
What are Scatter Plots?
They show if there is a linear relationship between two variables.
32
Skewness
Skewness refers to an imbalance, asymmetry, or distortion in the distribution of various organizational, economic, or behavioral factors.
33
What is a skewed distribution?
Skewed distributions have a relatively higher proportion of their values at the low (positive skew) or high (negative skew) end of the range
34
What is Normal distribution?
When the mean and median are approximately the same (PCS close to 0), the data is symetrically distributed around the central tendency.
35
What are normal distributions always?
1. Symetric 2. Asymptotic