Final Exam: Association Rule Mining Flashcards
Examples of Meta Data
- Timestamp: a time variable indicating when this transaction happens (Data type: Time)
- Your name: a string variable indicating the name of the buyer (Data type: Nominal; Possible Values: Ryan Rosevear, Jitin Janardhanan Nambiar)
- Which section: A string variable indicating the demographic info of the buyer (Data type: Nominal; Possible values: MIS 441-004, BAN 841)
Why do we transform the raw data table to number encoded?
To minimize human errors and memory usage.
Accuracy: Randomly select a few transactions to ensure the transaction is correct.
Why do we set minimum support or confidence when running the algorithm?
- To filter out unimportant rules
- To reduce computational load
- To improve result quality
How to compute the number of sales for a particular item when given support, confidence, coverage, and lift?
Sales for Item A = support * total number of transactions
Example: 0.1 * 69 = Chocolate Chip Muffin Sales
What are some examples of association rule mining?
- Plan shelf space
- Selective marketing
- Supply Chain Management
- Recommendations
What is a Market Basket Analysis (MBA)?
The process that examines a long list of transactions in order to determine which items are frequently purchased together. The name reflect the idea of mining frequent patterns in a shopping cart/market basket.
What is the association rule?
The rule that “If buy A, then buy B” has a support … % and a confidence … %
Interpretation: Higher support and confidence implies a stronger association between A and B
What is support?
The metric that measures the probability that a customer buys A and B together.
What is confidence?
The metric that measures the probability that a customer buys B given they buy A.
How do we interpret a support that equals 2/5 and a confidence that equals 2/2?
40% of all the transactions buy milk and bread together. 100% of the customers who bought milk also bought bread.
Why is mining association rules from data nontrivial?
Because the number of possible itemset {A,B} is huge
What is the a priori algorithm?
A data mining algorithm used for association rule mining that identifies frequent itemsets and association rules in a dataset. It works by iteratively identifying frequent itemsets and then generating association rules based on those itemsets.
Why could a confidence of 67% be misleading or uninteresting?
Because the probability of buying a video is 75%. If a consumer buys both a video and a game, their probability of purchase decreases to 67%. They are less likely to buy a video in this instance.
What happens if lift is less than 1?
A and B are negatively correlated: The occurrence of one likely leads to the absence of the other one.
What happens if lift is greater than 1?
A and B are positively correlated. The occurrence of one implies the occurrence of the other one.
What happens if lift equals 1?
A and B are independent. The occurrence of one has no relation with the occurrence of the other one.
Please provide an interpretation of a lift lower than 1.
Though support and confidence suggest that it is a strong association rule, lift shows that the occurrence of computer games likely leads to the absence of videos. Therefore, when people buy computer games, recommend against buying videos.
What are some common data types found in data mining?
Real, Integer, Nominal, and Date
What are some common general roles found in data mining?
Label and ID
What are some common general data types?
Numerical, Categorical, and Time