Lesson 07: K-Means Flashcards

1
Q

What is machine learning?

A

A set of methods that instruct the computer to “learn” from a set of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is clustering?

A

A method of organizing data into groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is clustering so useful?

A

A lot data is unlabeled and clustering can provide insight into the structure of the data that distributions don’t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does clustering require?

A
  • Clustering requires numeric data (binary is ok)
  • All features need to be the same scale
  • Doesn’t work well on a large number of features
  • Pay attention to the distance metric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the most commonly used distance metrics with k-means?

A
  • Euclidean
  • Manhattan
  • Sup
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name the most common variants of k-means

A
  • fuzzy-mean (c-mean)
  • k-means++
  • k-medians
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is normalizing important before applying k-means?

A

Un-normalized variables are difficult to compare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Properties of using a linear transformation for normalizing.

A
  • The distribution of the data is preserved.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly