Clustering FINAL Flashcards
(123 cards)
Cluster analysis divides data into
groups (clusters) that are meaningful, useful, or both
Objects in a cluster
share the same characteristics
What fields is clustering used in
a variety of fields, health and medicine, business
How is clustering used in health and medicing?
Cluster patients according to symptoms presented upon evaluation
How is clustering used in business?
Cluster stores according to sales/customer satisfaction/…
How is clustering used in computer networking?
Cluster traffic according to application type
What does it mean that clusters are meaningful?
Clusters should capture the natural structure of the data
What does it mean that clusters are useful? Using the cluster prototype
Using the cluster prototype, clusters can be used as a starting point for data summarization, compression (vector quantization), classification (nearest neighbor)
Clustering groups data objects based on
information found in the data that describes the objects and their relationships
What type of learning is clustering
Unsupervised learning
Clustering goal
Objects within a cluster be similar to one another, but different from objects in other clusters
Is the notion of a cluster well defined?
No
Does clustering have to know exactly what it is sorting
No, can sort things like pennies nickels dimes without knowing how much they are worth
Partitional clustering
Divide objects into non-overlapping clusters such that each object belongs to exactly one cluster
Hierarchical clustering
Clusters can have subclusters
Exclusive Clustering
1:1 relationship between object and cluster
Why need hyper parameter for clusteiring
Tells us how many clusters we are expecting from dataset
Overlapping clustering
1:n relationship between object and cluster; an object can belong to > 1 cluster
Fuzzy clustering
n:n relationship, all objects belong to all clusters with a certain probability (or membership weight)
In fuzzy clustering, each object’s probability of belonging to all clusters should sum to
1.0
Complete clustering
Assign every point to at least one cluster
Partial clustering
Some objects may not be assigned to any cluster
What might some objects not assigned to a cluster represent?
Noise or outlier
Well-separated clusters
Each point is closer to all of the points in its cluster than to any point in another cluster