All Flashcards

(21 cards)

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unsupervised learning?

A

A type of machine learning where algorithms find patterns in data without labeled examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main types of unsupervised learning?

A

Clustering, dimensionality reduction, and association rule learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does K-means clustering work?

A

1) Place K random centroids, 2) Assign points to nearest centroid, 3) Move centroids to average of their points, 4) Repeat until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ““elbow method””?

A

A technique to find the optimal number of clusters by plotting K against inertia (within-cluster sum of squares) and looking for where the curve bends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does silhouette analysis measure?

A

How well-separated clusters are by calculating how similar each point is to its own cluster compared to other clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Gap Statistic?

A

A method to find the optimal K by comparing clustering performance on real data versus random data with no cluster structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is a null distribution important in the Gap Statistic?

A

It provides a baseline of what clustering would look like in random data with no natural clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is dimensionality reduction?

A

Transforming high-dimensional data into a lower-dimensional representation while preserving important information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the ““curse of dimensionality””?

A

As dimensions increase, data becomes sparse, distances lose meaning, and algorithms become less effective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Principal Component Analysis (PCA)?

A

A linear technique that reduces dimensions by projecting data onto directions of maximum variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does t-SNE differ from PCA?

A

t-SNE is non-linear, focuses on preserving local structure, and is better for visualization but doesn’t preserve global relationships as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are autoencoders?

A

Neural networks that compress data into fewer dimensions in the middle layer and then reconstruct the original data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main purpose of UMAP?

A

A dimensionality reduction technique that preserves both local and global structure better than t-SNE and is faster on large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does association rule learning work?

A

Discovers interesting relationships between variables in large datasets (e.g., “customers who buy X often also buy Y”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s the difference between hierarchical clustering and K-means?

A

Hierarchical builds a tree of clusters, doesn’t require specifying K in advance, and can capture nested cluster structures.

17
Q

What does the ““inertia”” measure in K-means?

A

The sum of squared distances between data points and their assigned cluster centroids (lower is better).

18
Q

How do you interpret silhouette scores?

A

Scores near +1 indicate well-defined clusters, 0 indicates overlapping clusters, and -1 suggests points are in the wrong cluster.

19
Q

When would you use DBSCAN instead of K-means?

A

When clusters have irregular shapes, different densities, or when you don’t know the number of clusters in advance.

20
Q

What is feature extraction?

A

Creating new features from original features to better represent underlying patterns (a form of dimensionality reduction).

21
Q

What’s the difference between hard and soft clustering?

A

Hard clustering assigns each point to exactly one cluster; soft clustering (like fuzzy c-means) gives points membership degrees to multiple clusters.