Unsupervised Learning Flashcards

1
Q

Give two reasons why unsupervised learning is often more challenging than supervised learning

A
  • Objectives become more fuzzy and subjective because there is no simple goal like prediction
  • Since a target variable is absent, methods for assessing model quality on the target variable is not applicable, which makes it difficult to evaluate results obtained
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe how principal components analysis works

A

PCA transforms a high-dimensional dataset into a smaller, much more manageable set of representative variables that capture most of the information in the original dataset (especially useful for highly correlated data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe how centering and scaling the variables will affect the results of principal components analysis

A
  • Centering: mean-centering does not affect the results of PCA since the variance remains unchanged when the values are added or subtracted by the same constant
  • Scaling: scaling affects PCA. PCA using variables on their original scale are determined on the sample covariance matrix. PCA using variables on standardized scale are determined on the sample correlation matrix.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the drawbacks (or limitations) of principal components analysis

A
  • It may not lead to interpretable results (because PCs as composite variables can be hard to interpret)
  • PCA may not be a good tool to use for non-linear relationships (because PCs rely on linear transformations)
  • Although PCA does dimension reduction, PCA is not doing feature selection, so no operational efficiency is achieved (because PCs are constructed from all original features)
  • The target variable is ignored since PC loadings and scores are generated independently of the target variable (because PCA is unsupervised)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain how K-means clustering works

A

K-means clustering assigns each observation in a dataset into one of relatively homogeneous K clusters, where K is specified upfront.
First, we randomly assign K points to be initial cluster centers. Then we perform an iteration process:
1. Assign each observation to the closest cluster based on Eucliedean distance
2. Recalculate the center of each of the K clusters
3. Repeat until the cluster assignments no longer change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain what the term “K-means” refers to

A

The algorithim involves iteratively calculating the K means/centers of the clusters, hence the name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain why it is desirable to run a K-means clustering algorithm multiple times

A

This is because the k-means clustering algorithim is guaranteed to arrive at a local but not global optimum. Initial cluster assignments affect the local optimum, so running the k-means clustering algorithim multiple times (20 to 50) increases the change of identifying a global optimum and getting a representative cluster grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain how the elbow method can be used to select the value of K

A

A plot of the proportion of variance explained (equal to the between-cluster variation divided by the total variation in the data) as new clusters get added can be used. the elbow of this plot represents when the proportion of variance explained has plateued.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain how hierarchical clustering works

A

Consists of a series of fusions of observations in the data. This is a bottoms-up clustering method that starts with the individual observations treated as its own cluster, then successively fuses the closest pair of clusters one pair at a time. The process iterates until all clusters are fused into a single cluster containing all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the difference between average linkage and centroid linkage

A
  • For average linkage, we first compute all pairwise distances, then take an average
  • For centroid linkage, we fisrt take the average of the feature values to get the two centroids, then comput the distance between the two centroids.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the two differences between K-means clustering and hierarchical clustering

A

K-means
* Randomization is needed to determine initial cluster centers
* Number of clusters is pre-specified
* Clusters are not nested

Hierarchical clustering
* Randomization is not needed
* Number of clusters is not pre-specified
* Clusters are nested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain how scaling the variables will affect the results of hierarchical clustering

A

Without scaling: unscaled variables result in some variables dominating the distance calculations to exert a disproportionate impact on cluster assignments
With scaling: equal importance is attached to each feature when performing distance calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain two ways in which clustering can be used to generate features for predicting a target variable

A

Generates features in two ways
1. Cluster groups: group assignments as a result of clustering is a factor variable/feature that can be used to predict a target variable
2. Cluster centers: replace original variables by the cluster centers to serve as numeric features. two advantages to this feature generation
* interpretation, cluster centers provide numeric summary of the characteristics of observations in different clusters
* prediction, cluster centers retains numeric characteristics of the observations and uses the summarized characteristics to help make better predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the properties of principal components?

A
  • Linear combinations of the original features
  • PCs are generated to capture as much information in the data as possible (w.r.t. variance)
  • PCs are mutually uncorrelated (different PCs capture different aspects of data)
  • Amount of variance explained decreases with PC order. For ex., PC1 explains the most variance and subsequent PCs explain less and less
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are two applications of PCA?

A
  • EDA including data visualization: Dataset becomes much easier to explore and visualize. Plot the score of the 1st PC vs. the scores of the 2nd PC to gain a 2D view of the data in a scatterplot
  • Feature generation: Replace the original variables by PCs to redue overfitting and improve prediction performance –> delete the original variables to avoid any duplication of information –> co-existence of the two groups will result in perfect collinearity anda. rank-deficient model if fitting a GLM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the tradeoff of increasing the number of PCs (M) to use?

A

As M increases:
* cumulative PVE increases
* dimension increases
* (if y exists) model complexity increases

17
Q

How can we choose the number of principal components (M) to use?

A
  • Scree plot: choose the number such that the cumulative PVE is high enough
  • CV: treat M as a hyperparamater to be tuned if y exists
18
Q

Why might we use complete and average linkage over single and centroid?

A
  • Complete and average linkage tend to result in more balanced and visually appealing clusters
  • Single linkage tends to produce extended, trailing clusters with single observations fused one-at-a-time
  • Centroid linkage may lead to inversion (some later fusions occur at a lower height than an earlier fusion)
19
Q

List two ways you can perform feature generation using PCA

A
  1. Take the first PC scores as a new feature as they are
  2. Take the PC loadings that are most similar and define a new feature (can drop loadings of other variables that are not similar) and calculate new PC scores as a new feature