Unsupervised Learning and Clustering Flashcards
What is Unsupervised Learning?
Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs.
What are the key characteristics of Unsupervised Learning?
- Works with unlabelled data (no predefined categories)
- Finds hidden structures in data
- Used for clustering & dimensionality reduction
What is Clustering?
Clustering is a method in unsupervised learning that groups similar data points together.
Why is Clustering used?
- Data Reduction
- Outlier Detection
- Data Segmentation
What are some real-world applications of Clustering?
- Social Network Analysis
- Image Segmentation
- Data Annotation
What are the steps in Clustering?
- Define a distance metric to measure similarity
- Form clusters by grouping similar data points
- Maximize within-cluster similarity, minimize between-cluster similarity
What is K-Means Clustering?
K-Means is a partition-based clustering algorithm that groups data into k clusters.
How does K-Means Clustering work?
- Choose the number of clusters (k)
- Select k random points as initial centroids
- Assign each data point to the nearest centroid
- Recalculate centroids by finding the mean of each cluster
- Repeat until centroids stop changing
What is the Elbow Method in K-Means?
The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the ‘elbow’ point where adding more clusters stops improving the fit significantly.
What are the strengths of K-Means Clustering?
- Simple and efficient
- Works well for large datasets
What are the weaknesses of K-Means Clustering?
- Requires predefined k
- Sensitive to initialization
- Struggles with non-globular clusters
What is Hierarchical Clustering?
Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions.
How does Hierarchical Clustering work?
- Start with each data point as its own cluster
- Merge the closest clusters based on a chosen distance metric
- Repeat until one large cluster remains
What are common distance metrics used in Hierarchical Clustering?
- Single Linkage
- Complete Linkage
- Centroid Distance
What are the strengths of Hierarchical Clustering?
- No need to specify k beforehand
- Creates arbitrarily shaped clusters
What are the weaknesses of Hierarchical Clustering?
Hierarchical Clustering is computationally expensive for large datasets.
What are the key takeaways from the study of Unsupervised Learning and Clustering?
- Unsupervised learning finds patterns in unlabeled data
- Clustering groups similar data points together
- K-Means is a fast, efficient partition-based method
- Hierarchical clustering builds a tree-like structure
- Choosing the right k is crucial for effective clustering