K-Means Clustering Flashcards
- What is unsupervised learning and how does it differ from supervised learning?
Unsupervised learning involves analyzing data (xn) without given labels (tn). Unlike supervised learning, which uses known labels to learn mappings, unsupervised learning focuses on discovering inherent structure in data, such as clustering customers based on their purchase histories.
- In the context of customer purchase data, how can clustering be used?
Each customer can be represented as a binary vector indicating which products they bought. Clustering groups customers who exhibit similar purchasing patterns, helping to identify customer segments or products often bought together.
- What is K-means clustering and what is its main objective?
K-means clustering is an unsupervised algorithm that partitions data into K clusters. Its main objective is to minimize the total within-cluster variance, or the sum of squared distances between each point and its assigned cluster centroid.
- How is each cluster defined in K-means clustering?
Each cluster is represented by a centroid µk (a point in the input space, e.g., µk = [µk1, µk2]^T). Data points are assigned to the cluster with the closest centroid, typically measured by Euclidean distance.
- Describe the iterative algorithm for K-means clustering in detail.
The K-means algorithm proceeds as follows: (1) Initialize K cluster centers (µ1, µ2, …, µK) randomly; (2) For each data point xn, assign it to the closest centroid by setting znk = 1 if xn is in cluster k (and 0 otherwise); (3) Update each centroid µk to be the average of all points assigned to that cluster, i.e., µk = (Σn znk xn)/(Σn znk); (4) Repeat steps 2 and 3 until the assignments do not change.
- What is the cost function that K-means minimizes and what are the constraints on the assignment variables?
K-means minimizes the cost function J = Σₙ Σₖ znk ||xn - µk||² subject to znk ∈ {0,1} and Σₖ znk = 1 for each n, meaning each data point is assigned to exactly one cluster.
- Why is there no analytical solution for the centroids µk in K-means?
Because the assignment of data points to clusters is discrete and depends on the centroids themselves, the problem becomes non-convex. Therefore, an iterative algorithm is used to approximate a local minimum.
- What are two major issues that affect the performance of K-means clustering?
The two main issues are: (1) Choosing the number of clusters K; (2) Initializing the cluster centers, as poor initialization can lead to suboptimal solutions or convergence to a bad local minimum.
- How does the K-means++ algorithm improve cluster initialization?
K-means++ improves initialization by selecting the first center randomly, and then choosing subsequent centers with a probability proportional to the squared distance from the nearest already chosen center. This method spreads out the initial centers and leads to a provably good O(log K) approximation of the optimal clustering.
- What is the Elbow heuristic and how is it used to choose the number of clusters K?
The Elbow heuristic involves plotting the total within-cluster sum of squares as a function of K and looking for a point (the ‘elbow’) where the rate of decrease sharply changes. This point indicates that adding more clusters beyond it yields diminishing returns.
- What is the Sum of Norms (SON) formulation in clustering and what extremes does it represent?
The SON formulation minimizes an objective of the form Σₙ ||xn - µ(xn)||² + λ Σₚ<q ||µp - µq||², where µ(xn) is the centroid of the cluster to which xn is assigned. If only the first term is used (λ = 0), every point becomes its own cluster (K = N); if only the second term is used (λ very large), all centroids collapse to one point (K = 1). Varying λ steers the solution between these extremes.
- When might K-means fail even if the data have clear cluster structure?
K-means may fail when clusters are non-spherical, elongated, or when one cluster cannot be adequately represented by a single point. In such cases, the assumption of a single centroid per cluster does not capture the true structure of the data.
- What is kernel K-means and why might one use it?
Kernel K-means is an extension of the standard K-means algorithm that uses kernel functions to implicitly map data into a high-dimensional feature space. This allows it to capture non-linear cluster structures by replacing the Euclidean distance with a kernel-induced distance.
- How is the squared distance between a point and a cluster center expressed in kernel K-means?
The squared distance is written as k(xn, xn) - (2/Nk) Σₘ zmk k(xn, xm) + (1/Nk²) Σₘₗ zmk zlk k(xm, xl), where k(x, x’) is the kernel function and Nk is the number of points in cluster k.
- Why is it beneficial to use the kernel trick in clustering?
The kernel trick allows the algorithm to compute distances in an implicit high-dimensional space without explicitly mapping the data. This enables the capture of complex, non-linear structures in the data while keeping computations efficient.
- How do mixture models differ from K-means clustering in terms of their approach to modeling data?
Mixture models assume that data points are generated by a mixture of underlying probability distributions (e.g., Gaussian mixtures) and assign soft, probabilistic cluster memberships. In contrast, K-means makes hard assignments based solely on minimizing Euclidean distances.
- What is one advantage of mixture models over K-means clustering?
Mixture models provide soft assignments, meaning they estimate the probability that each data point belongs to each cluster. This probabilistic framework can capture uncertainty and overlapping clusters better than the hard assignments of K-means.
- In unsupervised learning, why is clustering considered a fundamental task?
Clustering is fundamental because it helps reveal the underlying structure and natural groupings within data without the need for labels. This insight can be used for data exploration, segmentation, and as a preprocessing step for other machine learning tasks.
- How can K-means clustering be applied to group products that are frequently bought together?
Each product can be represented by features such as purchase frequency or co-occurrence with other products. K-means can then cluster products based on these features, grouping those that are commonly purchased together.
- Summarize the key steps of the K-means clustering algorithm.
The algorithm involves: 1) Randomly initializing K centroids; 2) Assigning each data point to the nearest centroid; 3) Updating each centroid as the mean of the points assigned to it; 4) Repeating steps 2 and 3 until the cluster assignments no longer change.
- What is the objective function of K-means clustering and why is it minimized?
The objective function is J = Σₙ Σₖ znk ||xn - µk||², which represents the total within-cluster variance. Minimizing this function leads to compact, well-separated clusters.
- How does K-means ensure that each data point is assigned to exactly one cluster?
The assignment variable znk is defined such that znk ∈ {0, 1} and for each data point n, Σₖ znk = 1. This constraint ensures that every point is assigned to one and only one cluster.
- What challenges might arise when choosing the initial cluster centers in K-means, and how does this affect the algorithm?
Poor initialization can cause K-means to converge to a suboptimal local minimum, leading to ineffective clustering. The outcome can be highly sensitive to the initial centers, making proper initialization (e.g., via K-means++) critical for good performance.
- Explain the concept of parameter sharing in the context of clustering algorithms. Is this concept directly applicable to K-means?
In neural networks, parameter sharing means using the same parameters across different parts of the input. While K-means does not share parameters in the same sense, it uses a consistent distance metric and update rule for all clusters, which standardizes the clustering process across the dataset.