Association FINAL Flashcards by Ivan Tejeda

Association Rules interested in

Observing which objects occur together

How well did you know this?

Not at all

Perfectly

Association rules recommending or co-occur?

Seeing which items co-occur

How well did you know this?

Not at all

Perfectly

Association Rule Mining

Given a set of transactions, find the rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

How well did you know this?

Not at all

Perfectly

Does implications mean casuality?

No, means co-occurrence

How well did you know this?

Not at all

Perfectly

{} -> {}

Antecedent -> Consequent

How well did you know this?

Not at all

Perfectly

3 types of database

Binary, Transaction, Vertical

How well did you know this?

Not at all

Perfectly

Items

I = {x1, x2, …, xm}

How well did you know this?

Not at all

Perfectly

A set X within the set of items

Itemset

How well did you know this?

Not at all

Perfectly

An itemset of cardinality k

k-itemset

How well did you know this?

Not at all

Perfectly

I^(k)

set of all k-itemsets

How well did you know this?

Not at all

Perfectly

Transaction identifiers, tids

T = {t1, t2, …, tn}

How well did you know this?

Not at all

Perfectly

t within T

tidset

How well did you know this?

Not at all

Perfectly

Transaction

Tuple in the form (t, X) where t is a unique transaction identifier and X is an itemset

How well did you know this?

Not at all

Perfectly

Support

The support of an itemset X in a dataset D denoted sup(X, D) is the number of transactions in D that contain X

How well did you know this?

Not at all

Perfectly

Relative Support

The relative support of X is the fraction of transactions that contain X: sup(X,D)|D|

How well did you know this?

Not at all

Perfectly

We use F to

denote the set of all itemsets

How well did you know this?

Not at all

Perfectly

We use F^(k)

Study These Flashcards

to denote the set of k-itemsets

Itemset mining problem

Study These Flashcards

Given a minimum support threshold (minsup), find all itemsets X s.t. sup(x) >= minsup

Frequent itemsets

Study These Flashcards

An itemset X is frequent if sup(x) >= minsup where minsup is a user specified minimum support threshold (if minsup is fraction, then relative support is implied)

Total possible subset

Study These Flashcards

2^|I|

Naive approach to generate all itemsets that are frequent

Study These Flashcards

For all x in I:
compute support
if support >= minsup
add to list

The brute force method

Study These Flashcards

Explores the entire itemset search space, regardless of minsup

Goal of Association Rule Mining

Study These Flashcards

Given a set of transactions T, find all the rules having:
support >= minsup
confidence >= mincond

Apriori principle

Study These Flashcards

If an itemset is frequent, then all of its subsets must be frequent as well

Apriori principle 2

If an itemset if infrequent, then all of its supersets must be infrequent as well

A rule is frequent if

the itemset XY is frequent, sup(XY) >= minsup

A rule is strong if

conf >= minconf

Rules are pruned using

confidence

confidence (x->y)

sup(XY)/sup(x)

Unlike support, confidence does not exhibit

the monotone property

If a rule x -> y\x does not satisfy the confidence threshold, then

any rule x'->y\x', where x' within X, must not satisfy the confidence threshold as well

What happens if misnup is too high

we may miss interesting low-support items ex: such items may correspond to expensive products that are rarely purchased by customers, but whose patterns are interesting to mine for the retailer

What happens if minsup is too low

We get information overload: too many frequent itemsets and too many spurious rules

How can some high confidence rules be misleading?

High confidence might not imply a meaningful relationship if the consequent is already common in the dataset, irrespective of the antecedent

Confidence measure ignores

the support of the itemset appearing in the rule consequent

which metric accounts for the consequent

lift

conf(x->y)/rsup(y)

value of lift close to 1 implies

that the support of the rule is expected

Good lifts, bad lifts

>1, <<1

Association FINAL Flashcards

(40 cards)