05. Association Rules Flashcards

1
Q

Association Rules (Market Basket Analysis) is what

A

Association Rules is an unsupervised, descriptive, method to discover interesting relationships. The disclosed relationships can be represented as rules or frequent itemsets. Commonly used for mining transaction databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Association Rules, if “X” is observed then “Y” has a high probability of being observed approach, can be applied to which fields

A

Which products tend to be purchased together?
Of those customers who are similar to this person, what products do they tend to buy?
Of those customers who have purchased this product, what other similar products do they tend to view or purchase?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In the rule “when item’s X is observed, then item’s Y is also observed” what is X and what is Y

A

X is called antecedent or left-hand-side (LHS)

Y is called consequent or right-hand-side (RHS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the notation and meaning of a k itemset

A

In a k itemset the k refers to the total number of items in that itemset {item 1, item 2, …, item k}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the underpinning idea of the Apriori algorithm

A

It is a method of “pruning” the otherwise exponential associations by considering the “downward closure property” which is to say that if an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a frequent itemset

A

A frequent itemset has items that appear together often enough. The term “often enough” is formally defined with a minimum support criterion. If the minimum support is set at 0.5, any itemset can be considered a frequent itemset if at least 50% of the transactions contain this itemset. In other words, the support of a frequent itemset should be greater than or equal to the minimum support.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Apriori algorithm method

A

The Apriori algorithm takes a bottom-up iterative approach to uncovering the frequent itemsets by first determining all the possible items (or 1-itemsets, for example {bread}, {eggs}, {milk}, …) and then identifying which among them are frequent. Assuming the minimum support threshold (or the minimum support criterion) is set at 0.5, the algorithm identifies and retains those itemsets that appear in at least 50% of all transactions and discards the itemsets that have a support less than 0.5 (appear in fewer than 50% of the transactions). It then repeats this with the 2-itemsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Association Rules what is Support

A

Support (X => Y) =
( Number of transactions with both X and Y ) /
The total number of transactions

Support is an indication of how frequently the itemset appears in the dataset - this is just the probability of that combination appearing!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In Association Rules what is Confidence

A

Confidence (X => Y) =
( Number of transactions with both X and Y ) /
The total number of transactions containing X

Confidence is an indication of how often the rule has been found to be true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In Association Rules what is Lift

A

Lift (X => Y) =
(Support (X and Y)) /
((Support of X)*(Support of Y))

Lift (X => Y) =
P(X,Y)/((P(X)*P(Y))

Lift indicates how likely Y itemset is to be picked along with itemset X than by itself expressed as a ratio

It is a multiplier of the normal chance

(Support of X) * (Support of Y) is the assumption that if these were entirely independent then the probability of getting this result is the P(X) x P(Y) like dice rolls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In Association Rules what is Leverage

A

Leverage (X => Y) =
(Support (X and Y)) - ((Support of X)*(Support of Y))

Leverage (X=>Y) =
P(X,Y)-(P(X)*P(Y))

Leverage indicates how likely Y itemset is to be picked along with itemset X than by itself expressed as a difference

(Support of X) * (Support of Y) is the assumption that if these were entirely independent then the probability of getting this result is the P(X) x P(Y) like dice rolls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the benefit of know Lift

A

If X occurred independently from Y then Lift = 1. When two events are independent of each other no rule can be drawn involving those two events.

If Lift >1, (greater than 1) that lets us know the degree to which two occurrences are dependent on one anotherand makes those rules potentially useful for predicting the consequent in future data sets.

If Lift >1 (less than 1) suggests a negative association – where purchasing one item reduces the probability of buying the other.

Note that is the lift is zero, then they are exclusive, buying one means not buying the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The first iteration of the Apriori algorithm does what

A

Looks at the support of the itemsets which contain only one item, given support is X=>Y/all transations, and we are looking at X in isolation, the first calculation of support is just X/All hence the % of instances. So say the support needs to be at least 2% means that only items that appear in a frequency of at least 2% will be taken to the next level. The rest are “pruned”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the syntax in R for applying the Apriori association algorithm

A

itemsets = apriori ( Groceries, parameter=list (minlen=1, maxlen=1, support=0.02, target=”frequent itemsets”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens in the lead into step two of applying the Apriori association algorithm in R

A

All of the items that survived the first round are now joined into combos ie. 1, 3, 7 were considered frequent (had a great enough support) so combos 13, 17 & 37 will now be assessed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the syntax in R for applying the Apriori association algorithm including confidence

A

itemsets = apriori ( Groceries, parameter=list (support=0.02, confidence=0.6, target=”frequent itemsets”)

17
Q

List the five approaches whcih could be used to improve the Apriori’s efficiency

A
Partitioning
Sampling
Transaction Reduction
Hash-Based itemset counting
Dynamic itemset counting
18
Q

What does a Lift of 1 mean

A

If X occurred independently from Y then Lift = 1.

When two events are independent of each other no rule can be drawn involving those two events.