What is an outlier?

An extreme value that lies outside overall pattern of data


What are the common definition of an outlier?

Any value that is:
Greater Q3 + k x IQR
Less than Q1 - k x IQR


What is an anomaly?

Outlier that is removed from data since it is clearly an error and it would be misleading to keep it


What is cleaning the data?

The process of removing anomalies from a data set


What features does a box plot show with lines?

Lowest value
Highest value
Q1, Q2, Q3
Outliers are a cross


What should be done when comparing two box plots?

Use the same scale
Compare medians, IQR and extremes


What is bivariate data?

Data which has pairs of values for two variables


What is usually plotted on each axis?

x-axis - independent variable
y-axis - dependent variable


What is a causal relationship?

When a change in one variable causes a change in the other
Correlation doesn't show causation


What is the regression line?

Straight line that minimises the sum of the squares of the distance of each data point from the line


What is the equation of the regression line?

y = a + bx


When should you use a regression line to make predictions?

When values are within the range of the given data