Unsupervised Learning and Clustering Flashcards

1
Q

What is Unsupervised Learning?

A

Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key characteristics of Unsupervised Learning?

A
  • Works with unlabelled data (no predefined categories)
  • Finds hidden structures in data
  • Used for clustering & dimensionality reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Clustering?

A

Clustering is a method in unsupervised learning that groups similar data points together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is Clustering used?

A
  • Data Reduction
  • Outlier Detection
  • Data Segmentation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some real-world applications of Clustering?

A
  • Social Network Analysis
  • Image Segmentation
  • Data Annotation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps in Clustering?

A
  • Define a distance metric to measure similarity
  • Form clusters by grouping similar data points
  • Maximize within-cluster similarity, minimize between-cluster similarity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is K-Means Clustering?

A

K-Means is a partition-based clustering algorithm that groups data into k clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does K-Means Clustering work?

A
  • Choose the number of clusters (k)
  • Select k random points as initial centroids
  • Assign each data point to the nearest centroid
  • Recalculate centroids by finding the mean of each cluster
  • Repeat until centroids stop changing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Elbow Method in K-Means?

A

The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the ‘elbow’ point where adding more clusters stops improving the fit significantly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the strengths of K-Means Clustering?

A
  • Simple and efficient
  • Works well for large datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the weaknesses of K-Means Clustering?

A
  • Requires predefined k
  • Sensitive to initialization
  • Struggles with non-globular clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Hierarchical Clustering?

A

Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Hierarchical Clustering work?

A
  • Start with each data point as its own cluster
  • Merge the closest clusters based on a chosen distance metric
  • Repeat until one large cluster remains
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are common distance metrics used in Hierarchical Clustering?

A
  • Single Linkage
  • Complete Linkage
  • Centroid Distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the strengths of Hierarchical Clustering?

A
  • No need to specify k beforehand
  • Creates arbitrarily shaped clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the weaknesses of Hierarchical Clustering?

A

Hierarchical Clustering is computationally expensive for large datasets.

17
Q

What are the key takeaways from the study of Unsupervised Learning and Clustering?

A
  • Unsupervised learning finds patterns in unlabeled data
  • Clustering groups similar data points together
  • K-Means is a fast, efficient partition-based method
  • Hierarchical clustering builds a tree-like structure
  • Choosing the right k is crucial for effective clustering