Untitled Deck Flashcards

(22 cards)

1
Q

What is the goal of clustering?

A

To segment observations into similar groups based on an observed variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of machine learning is clustering classified as?

A

Unsupervised machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is clustering called unsupervised machine learning?

A

Because there is no objective function (no predicted dependent variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can clustering be used for in data preparation?

A

To identify variables or observations that can be aggregated or removed from consideration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bottom-up hierarchical clustering?

A

A method that starts with each observation in its own cluster and merges the most similar clusters to create nested clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-means clustering?

A

A method that assigns each observation to one of k clusters to maximize similarity within clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the value of k represent in K-means clustering?

A

The number of clusters, which is subjectively chosen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the iterative process in K-means clustering?

A

Observations are assigned to clusters and centroids are updated until clusters stabilize or a predefined number of iterations is reached.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do both clustering methods depend on?

A

The level of similarity/dissimilarity between two observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Euclidean Distance?

A

A method to measure similarity/distance between observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Manhattan Distance?

A

Another method to measure similarity/distance between observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the matching coefficient?

A

A measure used for categorical variables encoded as 0-1, counting similarities based on 0 entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Jaccard’s Coefficient not count?

A

Matching zero entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the alternatives for comparing observations in hierarchical clustering?

A
  • Single linkage
  • Complete linkage
  • Group average linkage
  • Median linkage
  • Centroid linkage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the initial step in K-means clustering?

A

Randomly assigning each observation to one of k clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is calculated during the update step of K-means clustering?

A

The arithmetic means of all coordinates to find the new cluster centroids.

17
Q

What is the assignment step in K-means clustering?

A

Reassigning observations to the cluster with the closest centroid based on squared Euclidean distance.

18
Q

What is required for K-means clustering in terms of resources?

A

Substantial computer processing power.

19
Q

What variables were used in the K-means clustering example?

A

Age & Income.

20
Q

What is a key difference between hierarchical clustering and K-means clustering?

A

Hierarchical clustering builds a tree of clusters while K-means partitions data into k clusters.

21
Q

What are association rules used for?

A

Analyzing shopping cart transactions.

22
Q

What is involved in evaluating association rules?

A

Constraints and metrics to assess the strength and usefulness of the rules.