Topic 2 Flashcards

1
Q

What is the purpose of descriptive statistics?

A

To describe or summarise the overall pattern of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you describe numerical data?

A

The three S’s - shape, centre and spread (plus outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you describe categorical data?

A

Table of frequencies or proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a symmetrical shape?

A

Right and left side mirrored, can also be bell-shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a shape that is skewed to the left?

A

Left side extends further out than the right side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a shape that is skewed to the right?

A

Right side extends further out than the left side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a symmetrical, bimodal shape?

A

Symmetrical with two peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a symmetrical, uniform shape?

A

Symmetrical and flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an outlier? What may be the cause of them?

A

Observations that deviate from the overall pattern of distribution. They may be caused by natural variation or measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are numerical summaries for centre or location? (3)

A

Mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are numerical summaries for spread? (3)

A

Range, inter-quartile range (IQR), standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is mode?

A

The most common value or peak of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is median?

A

The middle; the value that divides an ordered data set into two equal halves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For what types of variables would you find the median?

A

Ordinal, discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is mean?

A

The average of the data, found by adding all values and dividing by the number of cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘x bar’ symbol represent?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Is mean or median resistant to outliers/skewness and why?

A

Median, because it is always the middle. Mean can be more affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Mean ? median in symmetrical data?

A

Mean = median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mean ? median in skewed left data?

A

Mean < median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean ? median in skewed right data?

A

Mean > median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the ‘range’ of data?

A

The difference between the largest and smallest values in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the first, second and third quartiles?

A

Q1 - 25% of data below Q1
Q2 - 50% of data below Q2 - aka the median
Q3 - 75% of data below Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you calculate quartiles? (4 steps)

A
  1. Arrange data from lowest to highest
  2. Calculate the median (M)
  3. Calculate Q1 - median of the first half of data (excluding M)
  4. Calculate Q3 - median of the second half of the data (excluding M)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you find the interquartile range (IQR)?

A

IQR = Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the 1.5IQR rule used for?
A criteria used to identify outliers
26
How do you find the lower threshold to identify any low outliers?
Q1 - 1.5IQR
27
How do you find the upper threshold to identify any high outliers?
Q3 + 1.5IQR
28
What is the 5-number summary?
Summary of the minimum, Q1, median, Q3 and maximum
29
What two values are represented by the sides of a box on a boxplot?
Q1 and Q3
30
What does the line in a box of a boxplot indicate?
The median
31
What value is s squared?
Variance
32
What do you do to the value of variance to get the standard deviation?
Find the square root of variance
33
What does a small standard deviation imply?
The data is concentrated around the mean
34
What does a large standard deviation imply?
The data is widely spread around the mean
35
Is standard deviation or IQR used more commonly? Which is resistant and sensitive to outliers?
Standard deviation is used more commonly however it is sensitive to outliers. IQR is resistant to outliers.
36
What measure of centre is used for symmetrical data?
Mean
37
What measure of spread is used for symmetrical data?
Standard deviation
38
What measures of centre are used for data that is skewed or with outliers?
Median and mean
39
What measures of spread are used for data that is skewed or with outliers?
Standard deviation and IQR
40
What graphs are used with one categorical and one numerical variable? (3)
- Side-by-side - Histograms/boxplots
41
What graph is used with two numerical variables?
Scatterplot
42
What descriptive statistics number/data is used with two numerical variables?
Correlation coefficient - r
43
What does a response variable measure/record? On which axis is it plotted?
A response variable measures the outcome of a study. It is plotted on the y-axis
44
What does an explanatory variable measure/record On which axis is it plotted?
An explanatory variable explains the changes in the response variable. It is plotted on the x-axis
45
What is an independent variable compared to a dependent variable?
A variable that can be controlled to determine the value of a dependent variable
46
What are some synonymous terms for independent variable? (6)
- Explanatory variable - Predictor variable - Controlled variable - Regressor - Manipulated variable - Input variable
47
What are some synonymous terms for dependent variable? (6)
- Outcome variable - Response variable - Measured variable - Regressand - Observed variable - Output variable
48
Does correlation always imply causation?
No
49
What graphs would be used for a continuous Y variable and a categorical X variable? (2)
- Side-by-side boxplots - Vertically aligned histograms
50
What graph would be used for a continuous Y variable and a continuous X variable?
Scatterplot
51
What graph would be used for a categorical Y variable and a categorical X variable?
Clustered bar chart
52
What is the correlation coefficient a measure of?
It is a measurement of the strength of the linear relationship between two continuous variables, X and Y
53
With what graph do you always use the correlation coefficient?
Scatterplot
54
If the correlation coefficient r > 0, what does this mean for the linear relationship between X and Y?
r > 0 means as X increases, Y tends to increase
55
If the correlation coefficient r < 0, what does this mean for the linear relationship?
r < 0 means as X increases, Y tends to decrease
56
If r=0, what does this mean?
Existence. There is no linear relationship between X and Y. There could be some other kind of relationship
57
What values of r indicate a stronger linear relationship?
The closer r is to 1 or -1, the stronger the linear relationship
58
What would the graph show if r = -1 or r = 1?
The observations lie exactly on a line, with no scatter
59
Is r sensitive to outliers?
Yes
60
Can r be used for curved relationships?
No
61
Does r (correlation) distinguish between a predictor variable and a response variable?
No
62
What four characteristics should be asked from a scatterplot?
- Does a relationship exist between the two variables? - What is its form? (linear, curved etc.) - Is it increasing or decreasing? - How strong is the relationship? (Correlation coefficient r)
63
What descriptive statistics can be used for one categorical variable?
Frequency table
64
What graphs can be used with two categorical variables? (2)
- Clustered bar chart - Stacked bar chart
65
What three characteristics should be asked from clustered/stacked bar charts?
- Does a relationship exist between the two variables - Is it increasing or decreasing - How strong is the relationship?