Part 2: BI association rules Flashcards

1
Q

Objective of association

A

Objective is finding interesting associations (relationships) between attributes in a data set. -> no classification because we do not have a class variable, that’s why it is unsupervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Rules of association

A

Antecedent -> consequent

LHS -> RHS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

antecedent / #LHS

A

Number of items (records) in the database that match with antecedent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Indices for rules

A
  • Support (Coverage) =#(LHS and RHS)/#DB or
    #(antecedent and consequent) / #DB
  • Accuracy (Confidence) = #(LHS and RHS)/#LHS or
    #(antecedent and consequent) / #antecedent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Frequent itemsets

A
  • Item: one attribute - value pair
  • Itemset: all items occurring in a transaction or record
  • Frequent itemset - an itemset with minimal support k, i.e., support exceeding a threshold k predefined by the user.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Itemsets association rules

A
  • Association rule: IF - THEN format.
    + LHS, RHS - one item (attribute-value pair) or conjunction of items.
  • One itemset -> many association rules.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Apriori property

A

e.g. itemset (A, B, C)
Support(A, B, C) >_ k -> support(A, B) >_ k, support(B, C) >_k, etc. for all subsets.
Note: opposite may not be true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-itemsets with minimal support

A
  1. Find all 1-itemsets with minimum support.
  2. Store them in file1.
  3. Compute all 2-itemsets by combining 1-itemsets.
  4. Store 2-itemsets with minimum support in file2.
  5. Compute all 3-itemsets by combining 2-itemsets.
  6. Store 3-itemsets with minimum support in file3.
  7. etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Finding association rules

A
  • A typical question: “find all association rules with support >_ s and confidence >_ c.”
    Note: “support” of an association rule is the support of the set of items it mentions.
  • Hard part: finding the high-support (frequent) itemsets.
    + checking the confidence of association rules involving those sets is relatively easy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Apriori algorithm

A
  • Definition = algorithm for finding association rules.
  • Description =
    Step 1: find all frequent itemsets with minimal support k.
    Step 2: from all frequent itemsets found in step 1, find the association rules with minimal accuracy m.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Rule interestingness measures

A
- Objective measures: 
\+ support
\+ confidence
\+ lift
- Subjective measures: 
A rule (pattern) is interesting if
\+ it is unexpected (surprising to the user)
\+ actionable (the user can do something with it)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Benchmark confidence

A
  • Confidence = #(antecedent and consequent) / #antecedent.
  • Assume antecedent and consequent are independent.
  • Then: confidence = PriorProb(consequent)
    Note: Prob of an event can be estimated by fraction of records in the database this fact occurs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lift measure

A

Tells us how strong the relation is between the antecedent and the consequent.
Rule: LHS -> RHS or antecedent -> consequent
Lift: Confidence/Prob(RHS) = Prob(LHS and RHS)/(Prob(LHS)*Prob(RHS))
Prob(RHS) = benchmark confidence

We assume that fractions in database are good approximations for probability.

  • Lift = 0 -> means that fr(RHS and LHS) = 0
  • Lift = 1 -> means that RHS and LHS are independent.
  • Lift&raquo_space; 1 -> most interesting rule, LHS strong indicator for RHS. Sometimes also &laquo_space;1 is interesting.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Summary

A
  • Association belongs to unsupervised learning.
  • Association rules vs. classification (decision) rules:
    + classification rules predict only one attribute, called class, whereas association rules find associations between attributes without distinction.
    + RHS of an association rule may contain conjunction of attribute-value pairs, whereas RHS of a classification rule contains only the class value.
    + association rules are not intended to be used together as a set, whereas classification rules are.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly