Data Mining Flashcards

1
Q

What 4 major issues do you need to addresss when clustering?

A

1) Similarity / dissimilarity measure
2) Standardization/normalization of numeric variables
3) Re-coding categorical variables
4) Number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe ways you might improve your model?

A

1) Divide and conquer: multiple models for different areas of data
2) Derive new, delete, merge variables
3) Impute missing values
4) Normalize/standardize numeric variables; remove outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly