6. Unsupervised Learning Flashcards
What is PCA.
Statistical tool that finds a low-dimensional representation of a dataset that contains as much info from the dataset as possible.
Can PCA be used in a supervised and unsupervised setting?
Yes, PCR would be used in a supervised setting
Formula for the mth PC
Formula
The principle component loadings are constrained to what?
Formula
Mth PC score formula
Formula
For PCA, do the variables need to be scaled or centred?
Centred
What is the maximum number of PC’s?
Min(n-1,p)
What is the formula for PVE of the mth PC?
Formula
What is the model equation for PCR? What happens when k=p?
Formula
What are the two methods of calculating within cluster variation?
Formula
What is the algorithm for k means clustering?
- Randomly assign a cluster to each observation. This is the initial cluster assignments, pre determined number of clusters.
- Calculate the centroid of each cluster
- For each observation, identify the closest centroid and reassign to that cluster
- Repeat steps 2 and 3 until the cluster assignments stop changing.
What are 2 drawbacks of k means clustering?
- Initial cluster assignments affect the final assignments.
- Selecting k is an arbitrary process
- Not robust
Does k means need to have it variables standardized?
No, this relies heavily on the problem at hand
Are k means and hierarchical clustering robust?
No
Is k means clustering greedy?
Yes
Centroid linkage is subject to _____. And single linkages has a dendogram that is _____
Inversions and skewed
True or false: when performing PCR, it is recommended to standardize the predictors prior to generating the principle components.
True. This is to avoid high variance variables from monopolizing the principle components.
Can PCR reduce overfitting?
Yes, instead of using all of the original variables, PCR uses only the first k PC’s to predict the response, which reduces overfitting.
Is PCR useful for performing feature selection?
No, because we are using all variables when we find the principle components.
Are all PCA loadings unique?
No. Each PC loading vector is unique (up to a sign flip). So, two different softwares can find the same loading vectors, but the signs may differ.
NOT unique bc they can take the negative value of themselves.
Together, do all the principle components explain 100% of the variance?
Yes
Which is more restrictive in its clustering nature, k means or hierarchical?
K means is less restrictive, because hierarchical clustering must produce nested clusters as a function of the number of clusters.
K means simply used euc distances, which has no specific structure on the results.
If only 3 of the 4 principle components are used in a model, will the cumulative PVE ever be 100?
No, only if all PC’s are used.
In cluster analysis, could we cluster the observations on the basis of the features or cluster the features based on the observations?
Both