Unsupervised Learning and Clustering Flashcards

Question 1

Q

What is Unsupervised Learning?

Answer

A

Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs.

Question 2

Q

What are the key characteristics of Unsupervised Learning?

Answer

A

Works with unlabelled data (no predefined categories)
Finds hidden structures in data
Used for clustering & dimensionality reduction

Question 3

Q

What is Clustering?

Answer

A

Clustering is a method in unsupervised learning that groups similar data points together.

Question 4

Q

Why is Clustering used?

Answer

A

Data Reduction
Outlier Detection
Data Segmentation

Question 5

Q

What are some real-world applications of Clustering?

Answer

A

Social Network Analysis
Image Segmentation
Data Annotation

Question 6

Q

What are the steps in Clustering?

Answer

A

Define a distance metric to measure similarity
Form clusters by grouping similar data points
Maximize within-cluster similarity, minimize between-cluster similarity

Question 7

Q

What is K-Means Clustering?

Answer

A

K-Means is a partition-based clustering algorithm that groups data into k clusters.

Question 8

Q

How does K-Means Clustering work?

Answer

A

Choose the number of clusters (k)
Select k random points as initial centroids
Assign each data point to the nearest centroid
Recalculate centroids by finding the mean of each cluster
Repeat until centroids stop changing

Question 9

Q

What is the Elbow Method in K-Means?

Answer

A

The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the ‘elbow’ point where adding more clusters stops improving the fit significantly.

Question 10

Q

What are the strengths of K-Means Clustering?

Answer

A

Simple and efficient
Works well for large datasets

Question 11

Q

What are the weaknesses of K-Means Clustering?

Answer

A

Requires predefined k
Sensitive to initialization
Struggles with non-globular clusters

Question 12

Q

What is Hierarchical Clustering?

Answer

A

Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions.

Question 13

Q

How does Hierarchical Clustering work?

Answer

A

Start with each data point as its own cluster
Merge the closest clusters based on a chosen distance metric
Repeat until one large cluster remains

Question 14

Q

What are common distance metrics used in Hierarchical Clustering?

Answer

A

Single Linkage
Complete Linkage
Centroid Distance

Question 15

Q

What are the strengths of Hierarchical Clustering?

Answer

A

No need to specify k beforehand
Creates arbitrarily shaped clusters

Question 16

Q

What are the weaknesses of Hierarchical Clustering?

Answer

A

Hierarchical Clustering is computationally expensive for large datasets.

Question 17

Q

What are the key takeaways from the study of Unsupervised Learning and Clustering?

Answer

A

Unsupervised learning finds patterns in unlabeled data
Clustering groups similar data points together
K-Means is a fast, efficient partition-based method
Hierarchical clustering builds a tree-like structure
Choosing the right k is crucial for effective clustering