Last Minute Information Flashcards
(29 cards)
What are the three types of missing values?
- MAR
- MNAR
- MCAR
What does MAR stand for?
Missing at Random
What does MNAR stand for?
Missing Not At Random
What does MCAR stand for?
Missing Completely at Random
What is Missing at Random Data?
Data that is more likely to be missing for some data objects.
What is Missing Completely at Random Data?
Data that is missing purely by chance, and all data objects have the same chance of it being missing.
What is Missing Not At Random Data?
Data that is missing, and it is known which data objects have missing data.
What are 4 approaches to dealing with Missing Values?
- Keep as-is
- Remove Columns
- Remove Rows
- Impute Values
What are three types of outliers?
- Data errors
- Legitimate Values
- Fraudulent Entries
What are 4 ways to deal with outliers?
- Do Nothing
- Replace with upper or lower cap
- Log Transformation
- Remove data objects with outliers
What are the two types of data errors?
- Random Errors
- Systemic Errors
What are the two types of data transformation?
- Normalisation
- Standardisation
What does standardisation do?
Rescale the data to have a mean of 0 and a standard deviation of 1
What doe Normalisation do?
Rescale the data to have a common scale, often [0, 1]
What are support vectors?
The points in an SVM that lie closest to the boundary.
What does DBSCAN stand for?
Density-based Spatial Clustering of Applications with Noise
What does PCA stand for?
Principle Component Analysis
How does DBSCAN cluster?
It finds areas of high density and expands the clusters from there.
What does t-SNE stand for?
t-distributed stochastic neighbour embedding
What does UMAP stand for?
Uniform Manifold Approximation and Projection
What are 4 ways to measure cluster difference?
- Simple Linkage
- Complete Linkage
- Average Linkage
- Centroid Linkage
What is Ward’s Method?
Its a method that decides whether clusters should be joined based on if it reduces the total distance from centroids.
How does DBSCAN create clusters?
For any point in any cluster, the point density around that point has to exceed a given threshold.
Which two hyperparameters define density in DBSCAN?
- eps (epsilon)
- minPoints