Skewness

Mean = median = mode (symmetrical)

Mean > median > mode (positive skew)

Data is skewed to the left

Mean < median < mode (negative skew) data is skewed to the right

Skewness formula

3(mean - median) divided by standard deviation

0 = symmetrical

+ = positive skew

- = negative skew

Linear interpolation

Can be used to calculate median, UQ and LQ from a frequency table

= LB + (position in group/number in group X class width)

Standard deviation

Square root of sum of x^2 divided by n - mean squared

When its for a frequency table its sqaure root of sum of x^2 multiplied by frequencies divided by sum of frequencies - mean^2

Sensitive to outliers

Measure of spread

Not affected by transformations

Regression line

Residual - distance from a data point to the regression line

The regression line will minimise the sum of the square of these residuals

For y on x it’s horizontal residuals

For x on y it’s vertical residuals

Extrapolation

Estimating a value outside the data you have

Extrapolated values are unreliable

Outliers

Data points more than 2 SD from the mean

Data point more than 1.5 X IQR more than the UQ or less than the LQ

Standardising scores

A means of comparison between data values from different data sets

Standardised scores = (X - mean)/SD

Conditions of binomial distributions

Two possible outcomes

Fixed number of trials - n

Trials are independent

The probability of success for each experiment is constant

Conditions of geometric distributions

2 possible outcomes - success and failure

Outcome of each trial is independent of the outcome of all the other trials

Probability of each trial is constant

The trials are repeated until a success occurs

Geometric - P(X>x)

= q^x

Geometric - P(X < or = x)

= 1-q^x

Geometric - P(X > or = x)

= q^(x-1)

Conditions for a normal distribution

99.9% of the data within 3 SD from the mean

95% within 2 SD from the mean

A continuous distribution which forms a symmetrical bell curve

Standardising a normal variable

Z = (X-u)/SD

Correlation

The measure of a relationship between two variables, greater correlation means the variable are more closely related

Combinations and permutations differences

Combinations involve making a choice/ selection in which the order is unimportant

Permutations are ordered arrangements of a set of items

Discrete random variables: expected mean and E(X^2)

E(X) = sum of xp

E(X^2) = sum of x^2 p

DRV: Variance

Var (X) = E(X^2) - [E(X)]^2

Standard deviation is just the square root of the variance

Mutually exclusive

Independent

When two events cannot happen at the same time

When one event has no effect on the other

Regression line x on y formula

x = a + by

where a = mean x - mean y b

where b = Sxy/Syy

Geometric - P(X < x)

1 - q^(x-1)

How to tell if two events are mutually exclusive?

They can’t happen at the same time

Hence P (A inersect B) = 0

P(A U B) = P(A) + P(B)

How to tell if two events are independent?

One event has no effect on the other

P(A/B) = P(A)

so P(A intersect B) = P(A) x P(B)