K-means clustering Flashcards

1
Q

Goal

A
It's an un-supervised learning problem.
Goal is to partition the input points 
x1, x2,..., xN 
into K sets 
S1, S2, … , Sk
and select centers (centroids) 
μ1, μ2, …, μk
for each cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Performance index

A
  • Quality of cluster j:
    Ej = sum(xn app Sj) || xn - μj ||^2
  • Error function
    Ein(S1,S2,…,Sk,μ1,…,μk) = sum(j=1,k) Ej = sum(n=1,N) || xn - μ(xn) ||^2

Minimizing the error function Ein in an NP-hard problem, so Lloyd’s algorithm is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lloyd’s algorithm

A
  • if the partition is fixed, the optimal centroids are easy to obtain
  • if the centroids are fixed, the optimal partition is easy to obtain

Algorithm
Given k

1) Initialize μj
2) Construct Sj as all points closest to μj
3) Update each μj to equal the centroid of Sj
4) Repeat 2 and 3 until Ein stops decreasing

Ein cannot increase with 2 and 3, so the alorithm eventually stops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to initialize μj

A
  • start taking an arbitrary point as center
  • as a second center, take the point furthest from the first center
  • to get each new center, keep adding the point furthest from the current set of centers
  • stop when k centers are selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly