Processing multivariate data Flashcards

(15 cards)

1
Q

What is multivariate data

A

Data that is described by more than 1 variable, the data is multidimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is a multivariate observation represented

A

Using a vector, x. (column vector)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If there are multiple samples for each variable how would the column vectors be combined and represented

A

In a matrix, having N samples (rows) and L features (columns)

The row contains all the variables
The column contains each sample for that specific variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is covariance and give the formula

A

Extension of the variance to multi-dimensions

Measure of association/correlation between two variables

This describes how the two variables relate or change togther

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If two variables increase/decrease together what will the covariance be

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If one variable increases and the other decreases, what will the covariance be

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If two variables are independent, what will the covariance be

A

close to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the covariance matrix

A

The diagonal elements are just the variance values of that feature.

The off-diagonals are the covariances

Element ij is the covariance between fetaure i and feature j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Difference between Population and sample statistics

A

Population refers to the entire group of data points e.g. adults in the UK. Population parameters are true but usually unknown values that describe the population.

A sample is a subset of the population selected for analysis. Sample statistics are the values calculated from the sample, used to estimate population parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If X1, X2, … , Xn are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.

Describe the sample mean

A

The sample mean x bar is an unbiased estimate of the population mean u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If X1, X2, … , Xn are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.

Describe the sample covariance

A

The sample covariance S is a biased estimate of the population covariance sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If X1, X2, … , XN are random samples drawn from a distribution with the population mean vector u and the population covariance matrix sigma.

Describe the unbiased estimate of the population covariance

A

The unbiased estimate of the population covariance is (N)/(N-1)S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is meant if an estimator is unbiased

A

An estimator is unbiased if, on average over many samples, it equals the true population value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we visualise 2d data

A

Scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly