06 Data Mining / Metric Clustering Flashcards

1
Q

Explain briefly the following three characterizations for clustering-approaches / methods:

A
  • Exclusive vs. non-exclusive
  • Crisp vs. fuzzy
  • Hierarchical vs. non-hierarchical
  • Exclusive: non overlapping clusters
  • non-exclusive: overlapping clusters
  • Crisp: Conventional characteristic functions α_k: X -> {0, 1} for each cluster C_k
    (element is either in cluster or not)
  • Fuzzy: fuzzy membership function α_k: X -> [0, 1] for each cluster C_k (element
    belongs to multiple clusters with different weight)
  • Hierarchical: imposes tree structure (Dendrogram) on C_k
  • Non-hierarchical: doesn’t impose tree structure on C_k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The local cost functions for Single Linkage (min) and Complete Linkage (max)
are defined as **(pic) **where d (u, v) denotes the
length of the shortest path between nodes u and v. State the analogous local cost
functions if nodes correspond to pattern vectors x_u and x_v in ℝ^n!

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

State and explain two advantages of DBSCAN (compared to K-Means)! (1 sentence each)

A
  • DBSCAN does not need to know K in advance, K-means does, which makes
    DBSCAN easier to compute
  • Arbitrarily shaped clusters (DBSCAN) instead of spherical clusters (K-means)
    (improved clustering quality)
  • DBSCAN has notion of noise, K-means doesn’t (improved clustering quality)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fuzzy C-Means:
a) What is the range of values of the entries membership matrix r_nk compared to KMeans?
b) Fuzziness-parameter m in the objective function: Informally: what happens in the limits
m -> 1 and m-> infinity? (No full mathematical derivation necessary!)
c) Explain the pragmatics behind the (following) additional conditions

A
  • a) Range of values is between [0,1] instead of pure {0, 1}
    (K-means is crisp, fuzzy C-means is fuzzy)
  • b) m models degree of fuzziness:
    m -> 1: K-Means (crisp case)
    m -> ∞: r_nk -> 1/K (where K is the number of clusters) (totally fuzzy)
  • c) The first condition makes it so that the membership of an element to the
    clusters can be seen as percentages and therefore as probabilities as well. The
    second condition makes sure that every cluster has at least one element has a
    “chance” to be in cluster C_k (in other words no cluster is useless)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are advantages of Gaussian Mixture Models compared to Fuzzy C-Means?
Informally explain the nature and geometrical interpretation of the GMM quantities μ_k
and Σ_k!

A
  • GMM does not favor spherical clusters like Fuzzy C-Means -> better clustering
    quality
  • μ_k: mean vectors
  • Σ_k: covariance matricies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the paradigm of the „Maximum Likelihood concept “?

A
  • It’s a method of estimating the parameters of a probability distribution by
    maximizing a likelihood function
  • Finding the parameters that best explain the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does „iid“ mean? What are the consequences here? Can You point them out in an
expression on these slides?

A
  • Iid: “identically independently drawn” ->
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we take the logarithm of the likelihood? How do we choose the base of the
logarithm? Why?

A
  • We use the monotonicity of the logarithm to get the maximum of the function (first
    derivation = 0). This is a lot easier with log and we get the same position.
  • The base is e because the likelihood function is exponential
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Case ONE Gaussian: Are the resulting equations directly solvable?

A
  • No because the equations depend on each other and form an “infinite loop”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a 1 of K representation?

A

It is a vector which has an x_i = 1 and for all other x_j ≠ x_i: x_j = 0. (There are k
different possibilities)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can You explain the formula for the responsibilities from an intuitive point of view?

A
  • How much is cluster k responsible for element x
  • Probability that element x belongs to cluster k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why don’t we have the full joint probability distribution p(X, Z | θ)?

A
  • We don’t know the full data set {X, Z}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly