Data Mining Chapter 2 Flashcards
(4 cards)
What are the four types of attributes (provide them in order of operation-allowance)?
- Nominal (distinctness)
- Ordinal (order)
- Interval (addition and subtraction)
- Ratio (multiplication and division)
The attribute ‘Age’ would be classified as having the type “Ratio”, but when is Age classified as an ordinal attribute?
Age normally has a true zero value (say 0-99), and multiplying the age of 10 gives back 20 which is possible.
But, if we assign ages in terms of intervals, these operations we did before are not possible anymore.
E.g. 0-10, 11-25, 26-75, 76-99, then we can speak of order, since the ages 11-25 are lower than the ages 26-75.
Give a formula for the relationship between the SMC and the Euclidean
distance for binary vectors consisting of only zeros and ones.
SMC = 1 - Euclid^2/n where n is the number of attributes in the vector.
Euclid = \sqrt(n - n \cdot SMC) with n the number of variables.
Give an example of a type of data mining problem for which the Jaccard coefficient is better suited than the SMC for computing the similarity between objects.
Explain why the Jaccard is better here.
An example would be analysis the market baskets of customers. You only want to look at the items which the customers have similar, not those that none of them have bought.