Cluster Analysis Flashcards

1
Q

What is Cluster Analysis?

A

is a multivariate statistical technique that groups observations on the basis some of their features or variables they are described by

observations in a dataset can be divided into different groups

example: clustering by geographic proximity
or language
or Market Segmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of Cluster Analysis?

A

To maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When is clustering most often used?

A

is often used as a preliminary step in all types of analysis

it is a useful technique for exploring and identifying patterns in the data

Data Scientists often turn to it when they have no idea where to start or what to expect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a key distinguishing trait of supervised leanering?

A

We are dealing with labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Euclidean distance?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Centroid?

A

the mean position of a group of points

aka - center of mass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does K in K-means clustering stand for?

A

The number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the proper way of selecting the number of clusters?

A

The elbow method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Clustering about?

A
  1. Minimizing the distance between points in a cluster
  2. Maximizing the distance between clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does WCSS stand for?

A

Within-cluster sum of squares

if we minimize WCSS we have reached the perfect clustering solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are pros of K-Means Clustering?

A
  1. Simple to understand
  2. Fast to cluster
  3. Widely available
  4. Easy to implement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some cons of K-means Clustering?

A
  1. We need to pick K
  2. Sensitive to initialization
  3. Sensitive to outliers
  4. Produces spherical solutions
  5. Standardization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 Types of Analysis?

A
  1. Exploratory
  2. Confirmatory
  3. Explanatory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are characteristics of Exploratory Analysis?

A
  • Getting acquainted with the data
  • Search for patterns
  • Plan - determining what methods may be useful to investigate further
    ie. Data Visualization, Descriptive Stats ( pd.describe() ), Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are characteristics of Confirmatory and Explanatory Analysis?

A
  • Explain a phenomenon
  • Confirm a hypothesis
  • Validate previous research

using hypothesis testing and regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two broad types of clustering?

A
  1. Flat ie K-Means
  2. Hierarchical
17
Q

What are the two types of Hierarchical Clustering?

A
  1. Agglomerative (bottom-up)
  2. Divisive (top-down)