Untitled Deck Flashcards
(22 cards)
What is the goal of clustering?
To segment observations into similar groups based on an observed variable.
What type of machine learning is clustering classified as?
Unsupervised machine learning.
Why is clustering called unsupervised machine learning?
Because there is no objective function (no predicted dependent variable).
What can clustering be used for in data preparation?
To identify variables or observations that can be aggregated or removed from consideration.
What is bottom-up hierarchical clustering?
A method that starts with each observation in its own cluster and merges the most similar clusters to create nested clusters.
What is K-means clustering?
A method that assigns each observation to one of k clusters to maximize similarity within clusters.
What does the value of k represent in K-means clustering?
The number of clusters, which is subjectively chosen.
What is the iterative process in K-means clustering?
Observations are assigned to clusters and centroids are updated until clusters stabilize or a predefined number of iterations is reached.
What do both clustering methods depend on?
The level of similarity/dissimilarity between two observations.
What is Euclidean Distance?
A method to measure similarity/distance between observations.
What is Manhattan Distance?
Another method to measure similarity/distance between observations.
What is the matching coefficient?
A measure used for categorical variables encoded as 0-1, counting similarities based on 0 entries.
What does Jaccard’s Coefficient not count?
Matching zero entries.
What are the alternatives for comparing observations in hierarchical clustering?
- Single linkage
- Complete linkage
- Group average linkage
- Median linkage
- Centroid linkage
What is the initial step in K-means clustering?
Randomly assigning each observation to one of k clusters.
What is calculated during the update step of K-means clustering?
The arithmetic means of all coordinates to find the new cluster centroids.
What is the assignment step in K-means clustering?
Reassigning observations to the cluster with the closest centroid based on squared Euclidean distance.
What is required for K-means clustering in terms of resources?
Substantial computer processing power.
What variables were used in the K-means clustering example?
Age & Income.
What is a key difference between hierarchical clustering and K-means clustering?
Hierarchical clustering builds a tree of clusters while K-means partitions data into k clusters.
What are association rules used for?
Analyzing shopping cart transactions.
What is involved in evaluating association rules?
Constraints and metrics to assess the strength and usefulness of the rules.