Lesson 07: K-Means Flashcards
1
Q
What is machine learning?
A
A set of methods that instruct the computer to “learn” from a set of data.
2
Q
What is clustering?
A
A method of organizing data into groups.
3
Q
Why is clustering so useful?
A
A lot data is unlabeled and clustering can provide insight into the structure of the data that distributions don’t.
4
Q
What does clustering require?
A
- Clustering requires numeric data (binary is ok)
- All features need to be the same scale
- Doesn’t work well on a large number of features
- Pay attention to the distance metric
5
Q
What are the most commonly used distance metrics with k-means?
A
- Euclidean
- Manhattan
- Sup
6
Q
Name the most common variants of k-means
A
- fuzzy-mean (c-mean)
- k-means++
- k-medians
7
Q
Why is normalizing important before applying k-means?
A
Un-normalized variables are difficult to compare.
8
Q
Properties of using a linear transformation for normalizing.
A
- The distribution of the data is preserved.