What is multivariate data
Data that is described by more than 1 variable, the data is multidimensional
How is a multivariate observation represented
Using a vector, x. (column vector)
If there are multiple samples for each variable how would the column vectors be combined and represented
In a matrix, having N samples (rows) and L features (columns)
The row contains all the variables
The column contains each sample for that specific variable
What is covariance and give the formula
Extension of the variance to multi-dimensions
Measure of association/correlation between two variables
This describes how the two variables relate or change togther
If two variables increase/decrease together what will the covariance be
positive
If one variable increases and the other decreases, what will the covariance be
negative
If two variables are independent, what will the covariance be
close to 0
Explain the covariance matrix
The diagonal elements are just the variance values of that feature.
The off-diagonals are the covariances
Element ij is the covariance between fetaure i and feature j
Difference between Population and sample statistics
Population refers to the entire group of data points e.g. adults in the UK. Population parameters are true but usually unknown values that describe the population.
A sample is a subset of the population selected for analysis. Sample statistics are the values calculated from the sample, used to estimate population parameters.
If X1, X2, … , Xn are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.
Describe the sample mean
The sample mean x bar is an unbiased estimate of the population mean u
If X1, X2, … , Xn are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.
Describe the sample covariance
The sample covariance S is a biased estimate of the population covariance sigma
If X1, X2, … , XN are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.
Describe the unbiased estimate of the population covariance
The unbiased estimate of the population covariance is (N)/(N-1)S
what is meant if an estimator is unbiased
An estimator is unbiased if, on average over many samples, it equals the true population value.
How do we visualise 2d data
Scatter plot