05 SNA Graph Clustering Flashcards

1
Q

State the central paradigm for formulating a quality measure for clustering (target
function for optimization)! (in words)

A

Objective function A (g) -> R that formalizes the clustering paradigm in a special
way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Coverage is defined as:

State a main disadvantage of using f(C) = γ(C) in a quality measure for clustering

A
  • Only accumulated intra cluster density is measured
  • Monotonic behaviour of coverage -> coverage is not a good sole quality index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the pragmatics of the following two conductance-based quality measures! (1-2
sentences each)

A
  • First measure: If the measure is small, at least one of the clusters (more precisely: the induced subgraph) contains at least one bottleneck -> this cluster is too coarse -> use minimum conductance cut to cut this cluster in “halves”
  • Second measure: If the measure is small, at least one of the clusters (more precisely: the induced subgraph) has many connections to outside -> the clustering is too fine -> merge clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the pragmatics of the quality measure Performance!

A
  • Count „correctly classified pairs of nodes”.
  • Correctly classified: s r and connected by edge (f counts the number of
    edges within clusters) OR not same cluster and not connected by edge (g counts the number of non-existent edges between clusters)
  • Calculating maximum of f+g is NP-hard (would be optimal clustering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In case a density measure π on graphs is available, a quality function can be constructed
by pessimistically setting
What are analogous expressions for average case and best case (optimistic view)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In case a density measure π on graphs is available, a quality function can be constructed
by pessimistically setting
Suggest a simple density measure π on graphs!

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Informally explain the difference between Linkage-based (Agglomerative) and Splittingbased (Divisive) hierarchical clustering! (1-2 sentences)

A

‡ Linkage (Agglomeration): Iteratively coarsens a given clustering by merging two
clusters until 1-clustering is reached (“bottum up”)
‡ Splitting (Division): Iteratively refines a given clustering by splitting one cluster
until singleton clustering is reached(“top down”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the difference between Complete Linkage and Single Linkage by stating the local
cost functions for both cases and giving 1 sentence of explanation!

A

Single Linkage defines the local merging cost by looking at the distance between
the closest elements in between the respective clusters, complete link defines the
local merging cost by looking at the distance between the farthest elements in
between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the benefit of a cut-function when using Splitting-based (Divisive) hierarchical
clustering? (1-2 sentences)

A

Cut function avoids having to test all possible split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Newman ± Girvan method: Explain Modularity! (2-3 sentences)

A

Tre is the fraction of edges within communities, e is the fraction of edges between
communities. The ultimate goal is to have many edges within communities, but
few between them. Modularity tells us how well we achieved that goal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly