Final Exam: Association Rule Mining Flashcards

1
Q

Examples of Meta Data

A
  1. Timestamp: a time variable indicating when this transaction happens (Data type: Time)
  2. Your name: a string variable indicating the name of the buyer (Data type: Nominal; Possible Values: Ryan Rosevear, Jitin Janardhanan Nambiar)
  3. Which section: A string variable indicating the demographic info of the buyer (Data type: Nominal; Possible values: MIS 441-004, BAN 841)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we transform the raw data table to number encoded?

A

To minimize human errors and memory usage.
Accuracy: Randomly select a few transactions to ensure the transaction is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we set minimum support or confidence when running the algorithm?

A
  1. To filter out unimportant rules
  2. To reduce computational load
  3. To improve result quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to compute the number of sales for a particular item when given support, confidence, coverage, and lift?

A

Sales for Item A = support * total number of transactions
Example: 0.1 * 69 = Chocolate Chip Muffin Sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of association rule mining?

A
  1. Plan shelf space
  2. Selective marketing
  3. Supply Chain Management
  4. Recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Market Basket Analysis (MBA)?

A

The process that examines a long list of transactions in order to determine which items are frequently purchased together. The name reflect the idea of mining frequent patterns in a shopping cart/market basket.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the association rule?

A

The rule that “If buy A, then buy B” has a support … % and a confidence … %
Interpretation: Higher support and confidence implies a stronger association between A and B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is support?

A

The metric that measures the probability that a customer buys A and B together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is confidence?

A

The metric that measures the probability that a customer buys B given they buy A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we interpret a support that equals 2/5 and a confidence that equals 2/2?

A

40% of all the transactions buy milk and bread together. 100% of the customers who bought milk also bought bread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is mining association rules from data nontrivial?

A

Because the number of possible itemset {A,B} is huge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the a priori algorithm?

A

A data mining algorithm used for association rule mining that identifies frequent itemsets and association rules in a dataset. It works by iteratively identifying frequent itemsets and then generating association rules based on those itemsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why could a confidence of 67% be misleading or uninteresting?

A

Because the probability of buying a video is 75%. If a consumer buys both a video and a game, their probability of purchase decreases to 67%. They are less likely to buy a video in this instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens if lift is less than 1?

A

A and B are negatively correlated: The occurrence of one likely leads to the absence of the other one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens if lift is greater than 1?

A

A and B are positively correlated. The occurrence of one implies the occurrence of the other one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens if lift equals 1?

A

A and B are independent. The occurrence of one has no relation with the occurrence of the other one.

17
Q

Please provide an interpretation of a lift lower than 1.

A

Though support and confidence suggest that it is a strong association rule, lift shows that the occurrence of computer games likely leads to the absence of videos. Therefore, when people buy computer games, recommend against buying videos.

18
Q

What are some common data types found in data mining?

A

Real, Integer, Nominal, and Date

19
Q

What are some common general roles found in data mining?

A

Label and ID

20
Q

What are some common general data types?

A

Numerical, Categorical, and Time