Untitled Deck Flashcards

Question 1

Q

What is the goal of clustering?

Answer

A

To segment observations into similar groups based on an observed variable.

Question 2

Q

What type of machine learning is clustering classified as?

Answer

A

Unsupervised machine learning.

Question 3

Q

Why is clustering called unsupervised machine learning?

Answer

A

Because there is no objective function (no predicted dependent variable).

Question 4

Q

What can clustering be used for in data preparation?

Answer

A

To identify variables or observations that can be aggregated or removed from consideration.

Question 5

Q

What is bottom-up hierarchical clustering?

Answer

A

A method that starts with each observation in its own cluster and merges the most similar clusters to create nested clusters.

Question 6

Q

What is K-means clustering?

Answer

A

A method that assigns each observation to one of k clusters to maximize similarity within clusters.

Question 7

Q

What does the value of k represent in K-means clustering?

Answer

A

The number of clusters, which is subjectively chosen.

Question 8

Q

What is the iterative process in K-means clustering?

Answer

A

Observations are assigned to clusters and centroids are updated until clusters stabilize or a predefined number of iterations is reached.

Question 9

Q

What do both clustering methods depend on?

Answer

A

The level of similarity/dissimilarity between two observations.

Question 10

Q

What is Euclidean Distance?

Answer

A

A method to measure similarity/distance between observations.

Question 11

Q

What is Manhattan Distance?

Answer

A

Another method to measure similarity/distance between observations.

Question 12

Q

What is the matching coefficient?

Answer

A

A measure used for categorical variables encoded as 0-1, counting similarities based on 0 entries.

Question 13

Q

What does Jaccard’s Coefficient not count?

Answer

A

Matching zero entries.

Question 14

Q

What are the alternatives for comparing observations in hierarchical clustering?

Answer

A

Single linkage
Complete linkage
Group average linkage
Median linkage
Centroid linkage

Question 15

Q

What is the initial step in K-means clustering?

Answer

A

Randomly assigning each observation to one of k clusters.

Question 16

Q

What is calculated during the update step of K-means clustering?

Answer

Study These Flashcards

A

The arithmetic means of all coordinates to find the new cluster centroids.

Question 17

Q

What is the assignment step in K-means clustering?

Answer

Study These Flashcards

A

Reassigning observations to the cluster with the closest centroid based on squared Euclidean distance.

Question 18

Q

What is required for K-means clustering in terms of resources?

Answer

Study These Flashcards

A

Substantial computer processing power.

Question 19

Q

What variables were used in the K-means clustering example?

Answer

Study These Flashcards

A

Age & Income.

Question 20

Q

What is a key difference between hierarchical clustering and K-means clustering?

Answer

Study These Flashcards

A

Hierarchical clustering builds a tree of clusters while K-means partitions data into k clusters.

Question 21

Q

What are association rules used for?

Answer

Study These Flashcards

A

Analyzing shopping cart transactions.

Question 22

Q

What is involved in evaluating association rules?

Answer

Study These Flashcards

A

Constraints and metrics to assess the strength and usefulness of the rules.

Untitled Deck Flashcards

(22 cards)