Last Minute Information Flashcards

(29 cards)

1
Q

What are the three types of missing values?

A
  1. MAR
  2. MNAR
  3. MCAR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does MAR stand for?

A

Missing at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does MNAR stand for?

A

Missing Not At Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does MCAR stand for?

A

Missing Completely at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Missing at Random Data?

A

Data that is more likely to be missing for some data objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Missing Completely at Random Data?

A

Data that is missing purely by chance, and all data objects have the same chance of it being missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Missing Not At Random Data?

A

Data that is missing, and it is known which data objects have missing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are 4 approaches to dealing with Missing Values?

A
  1. Keep as-is
  2. Remove Columns
  3. Remove Rows
  4. Impute Values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three types of outliers?

A
  1. Data errors
  2. Legitimate Values
  3. Fraudulent Entries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 4 ways to deal with outliers?

A
  1. Do Nothing
  2. Replace with upper or lower cap
  3. Log Transformation
  4. Remove data objects with outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two types of data errors?

A
  1. Random Errors
  2. Systemic Errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two types of data transformation?

A
  1. Normalisation
  2. Standardisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does standardisation do?

A

Rescale the data to have a mean of 0 and a standard deviation of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What doe Normalisation do?

A

Rescale the data to have a common scale, often [0, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are support vectors?

A

The points in an SVM that lie closest to the boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does DBSCAN stand for?

A

Density-based Spatial Clustering of Applications with Noise

17
Q

What does PCA stand for?

A

Principle Component Analysis

18
Q

How does DBSCAN cluster?

A

It finds areas of high density and expands the clusters from there.

19
Q

What does t-SNE stand for?

A

t-distributed stochastic neighbour embedding

20
Q

What does UMAP stand for?

A

Uniform Manifold Approximation and Projection

21
Q

What are 4 ways to measure cluster difference?

A
  1. Simple Linkage
  2. Complete Linkage
  3. Average Linkage
  4. Centroid Linkage
22
Q

What is Ward’s Method?

A

Its a method that decides whether clusters should be joined based on if it reduces the total distance from centroids.

23
Q

How does DBSCAN create clusters?

A

For any point in any cluster, the point density around that point has to exceed a given threshold.

24
Q

Which two hyperparameters define density in DBSCAN?

A
  1. eps (epsilon)
  2. minPoints
25
What is a core point?
A point that has more than the specified number of points within eps. These points are at the interior of a cluster.
26
What is a Border point?
A point that has fewer than minpoints within eps but is within a core points neighbourhood.
27
What is a noise point?
A point that is not a border point or a core point.
28
What are advantages of DBSCAN?
1. Resistant to Noise 2. Can handle clusters of different shapes and sizes
29
How does a Gaussian Mixture Model represent a cluster?
A cluster is defined by a probability distribution over it.