Probability, Correlation And Hypothesis Testing Flashcards

(56 cards)

1
Q

Comparative pie charts formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Comparative pie charts

A

The ratio of the sample size is the same as the ratio of the areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

‘Sum of’

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The sample mean when xi occurs with a frequency fi

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is discrete data?

A

Data that can only take certain values which are often integers but sometimes aren’t , for example shoe size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is continuous data?

A

Can take any numerical value such as height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the range?

A

Highest value - lowest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is IQR?

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard deviation formulas

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance formulas

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is probability?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a set?

A

A collection of numbers which cannot have repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a subset?

A

All the elements in ‘A’ are in ‘S’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an empty set?

A

An imaginary set with no elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a sample space?

A

All the possible outcomes of a random experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Complement of A

A

A’ (not A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

B is a subset of A

A

If B occurs so does A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mutually exclusive

A

The occurrence of one event excludes the possibility that any other events could occur (they cannot happen at the same time)
If A and B are exclusive the probability of A or B occurring is the probability of the sum of AUB

P(AUBUC) = P(A) +P(B) +P(C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Independent events

A

The probability of event A occurring is unaffected by whether or not B occurs
If A and B are independent then P(AnB) = P(A) x P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The addition law of probability

24
Q

Multiplication law

25
What is Pearson’s Product Moment Correlation Coefficient
The PMCC is denoted by R and named after Pearson, an applied mathematician who worked on the application of statistics to genetics evolution
26
PMCC formulas
27
Interpreting PMCC values
R = 1 perfect positive correlation R = -1 perfect negative correlation R = 0 no linear correlation
28
What does a measure of correlation indicate?
A relationship between the two values however, it does not indicate a causal relationship
29
Spearman’s correlation coefficient formula
30
Spearman’s
Makes no assumptions about the original data and the original data does not need to be linear
31
PMCC
We can only do a hypothesis test here if the variables are jointly normally distributed
32
H0 and H1
H0: null hypothesis (no correlation) H1: correlation
33
Hypothesis testing
34
What is a regression line?
It should intersect the double mean point and should be linear for bivariate data The equation for the linear regression line is given as: Y = ax + b Where a is the gradient and b is the y intercept X is the independent value and y is the dependent
35
Things to consider when analysing the regression model
How do we interpret the model How can we interpret in context the coefficient of x How can we interpret in context the constant term
36
What is a residual?
An error the model produces when trying to predict a data point It is the distance between the data point and regression line For y on x regression it is only sensible to consider predictions for y
37
How to calculate a residual?
38
What does a positive residual indicate?
Where the model is giving an underprediction
39
What does a negative residual indicate?
An overprediction
40
What should we see when we plot predicted vs actual?
Strong positive correlation
41
What should we see when we plot predicted vs residual?
A uniform distribution clustered around zero with no patterns
42
Anscombe’s quartet
Each data set has the same summary statistics and are clearly different
43
Unstructured statistics
Each data set has the same summary statistics but they are visually different
44
The normal distribution diagram
45
The normal distribution formula
46
What is the z-value?
The number of standard deviations a value is above/below the mean Because the normal distribution is symmetrical we can use the positive z-value to calculate the negative
47
We can only use the z-table when…
The z-value is positive (on the right of the graph) We’re finding the probability to the left of this z-value
48
Changing the direction of the inequality
Changing the sign or direction of the inequality does ‘1-‘ If we do both they cancel out
49
Standardising formula
50
To find the z score?
51
To find the z value for a probability?
Use the z table backwards Find the value on the table and work backwards
52
Central limit theorem
If we continually take samples of the same size and record their corresponding sample means, they themselves will be normally distributed around the known population mean
53
How is the sample mean normally distributed
54
Standard deviation
55
Continuity corrections
We can convert discrete data to continuous
56
Approximating
To approximate a binomial distribution as a normal we can copy over the mean and variance of the binomial We must change the letter as it is a different distribution