5.4 Quiz: Descriptive Analytics (EN) Flashcards

1
Q

Given the transaction database below, how many association rules have a minimal support of 50% and a minimal confidence of 75%? Use the apriori principle.

Transaction Item
1 A,B,C
2 A,B,E
3 A,B,C,D,E
4 A,B,C,D
5 A,D,E,F
6 B,C,F
7 C
8 A,B,C

A

7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Given the fact that the support of association rule {A,B} -> {C} equals 37.5% and the confidence of the same rule equals 50%, which of the following transactions must be the eight transaction in the database?

Transaction Item
1 A,B,F
2 A,B,C
3 A,B,D
4 C
5 A,D,E,F
6 A,B,C,D,E
7 A,B,C,D
8 ???

ABCD
AB
C
BCDE

A

AB
Support(X) = (Transactions containing X) / (Total transactions)
Confidence(X -> Y) = Support(X ∪ Y) / Support(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In association rule mining, the percentage of total transactions that contains item set is called

the support
the confidence
the lift

A

support

The support of an itemset is the percentage (or proportion) of total transactions in the dataset that contain that particular itemset.
The confidence of a rule (X -> Y) is the percentage of transactions containing
The lift of a rule (X -> Y) is a measure of how much more likely Y is to be bought when X is bought compared to the likelihood of Y being bought in general.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sequence rules aim at finding

inter-transaction patterns.
intra transaction patterns.

A

inter-transaction patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In association rule mining, the Apriori property states:

every superset of a frequent item set is frequent.
every subset of a frequent item set is infrequent.
every superset of an infrequent item set is frequent.
every subset of frequent item set is frequent.

A

very Subset of a Frequent Itemset is Frequent:

The Apriori property is a fundamental concept in association rule mining, and it helps in efficiently discovering frequent itemsets in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A dendrogram can be used to

measure the similarity between clusters.
to decide upon the optimal number of clusters.

A

to decide upon the optimal number of clusters.

In order to decide on the optimal number of clusters, a dendrogram can be used. This is a tree-like diagram that records the sequences of merges. Let’s illustrate this with an example. Assume that we want to cluster birds in terms of their characteristics, such as the way they look, the noise they make, what they eat, and where they live. You can see the clustering process illustrated here. First, we group chicken and duck, then parrot and canary. Step 3 adds pigeon to the chicken and duck cluster. Step 4 clusters owl and eagle. Step 5 merges the clusters obtained in steps 2 and 3 and step 6 merges everything together. An obvious question is where we should stop clustering?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Consider the association rule X ==> Y. The measure support(X U Y)/(support(X) × suppport(Y)) is called the

the support.
the confidence.
the lift.

A

the lift.

the lift is the measure represented by the given formula, and it assesses the significance of the association rule by comparing the observed support of the rule with the expected support under the assumption of independence.(?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Given the following transactions database:

Listener Artists
1 Tsjaikovski, Eminem, Måneskin, The Weeknd
2 Ariana Grande, Olivia Rodrigo, Beyoncé
3 Beyoncé, Ed Sheeran, Måneskin, Ariana Grande
4 Ed Sheeran, Beyoncé, Ariana Grande
5 Ed Sheeran, Måneskin
6 Eminem, The Weeknd, Måneskin
7 The Weeknd, Olivia Rodrigo
8 Måneskin, Ariana Grande, Eminem, The Weeknd
9 Ariana Grande, Ed Sheeran, Olivia Rodrigo

The association rule Beyoncé à Ed Sheeran, Måneskin has:

a support of 3/9 and a confidence of 2/3.
a support of 3/9 and a confidence of 1/3.
a support of 1/9 and a confidence of 2/3.
a support of 1/9 and a confidence of 1/3.

A

a support of 1/9 and a confidence of 1/3.

Support(Beyoncé -> Ed Sheeran, Måneskin) = Number of transactions containing (Beyoncé and Ed Sheeran and Måneskin) / Total number of transactions
Support(Beyoncé -> Ed Sheeran, Måneskin) = 1 / 9

Confidence(Beyoncé -> Ed Sheeran, Måneskin) = Support(Beyoncé and Ed Sheeran and Måneskin) / Support(Beyoncé)
Support(Beyoncé) = Number of transactions containing Beyoncé / Total number of transactions
Confidence(Beyoncé -> Ed Sheeran, Måneskin) = (1 / 9) / (3 / 9)
= 1 / 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Association rules aim at finding
inter-transaction patterns.
intra transaction patterns.

A

intra transaction patterns

association rules are concerned with what items appear together at the same time (intra-transaction patterns), sequence rules are concerned about what items appear at different times (inter-transaction patterns).

Association rules typically address co-occurrence patterns within individual transactions.
Sequence rules, on the other hand, address patterns that involve the order of events or items across different transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which statement is CORRECT?

In the single linkage method, the distance between two clusters is defined as the shortest distance between any two members in both clusters.

The complete linkage method defines the distance between two clusters as the maximum distance between any two members in both clusters.

The average linkage method calculates the average distance between all members in both clusters.

The centroid method calculates the distance between both cluster centroids.

All statements are correct.

A

All statements are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Descriptive analytics is also referred to as

supervised learning.
unsupervised learning.

A

unsupervised learning.

Descriptive analytics is also referred to as unsupervised learning since there is no target variable available to steer the learning process. The idea is to find structure in an unlabeled data set. Common techniques are clustering, association rules and sequence rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Given the following transactions database:

Listener Artists
1 Tsjaikovski, Eminem, Måneskin, The Weeknd
2 Ariana Grande, Olivia Rodrigo, Beyoncé
3 Beyoncé, Ed Sheeran, Måneskin, Ariana Grande
4 Ed Sheeran, Beyoncé, Ariana Grande
5 Ed Sheeran, Måneskin
6 Eminem, The Weeknd, Måneskin
7 The Weeknd, Olivia Rodrigo
8 Måneskin, Ariana Grande, Eminem, The Weeknd
9 Ariana Grande, Ed Sheeran, Olivia Rodrigo

The association rule: Eminem, The Weeknd à Måneskin has:

a support of 3/9 and a confidence of 3/9.
a support of 3/9 and a confidence of 1.
a support of 1 and a confidence of 3/9.
a support of 1 and a confidence of 1.

A

a support of 3/9 and a confidence of 1.

Support(Eminem, The Weeknd -> Maneskin)= 3/9
Confidence(Eminem, The Weeknd -> Maneskin)= (3/9)/(3/9)= 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Agglomerative and divisive clustering algorithms are examples of

hierarchical clustering.
non-hierarchical clustering.

A

hierarchical clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following are post-processing activities in association rule mining?

Filter out the trivial rules that contain already known patterns
(De triviale regels die al bekende patronen bevatten eruit filteren)

Perform a sensitivity analysis by varying the minimum support and minimum confidence values.

Use appropriate visualization facilities (e.g., OLAP-based) to find the unexpected rules that might represent novel and actionable behavior in the data.

Measure the economic impact (e.g., profit, cost) of the association rules.

All of the above.

A

All of the above.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A lift value bigger than 1 indicates a

negative dependence or substitution effect.
positive dependence or complementary effect.

A

positive dependence or complementary effect.

The lift value is a measure used in association rule mining to assess the strength of the relationship between two items or sets of items in a dataset. Specifically, it indicates how much more likely the items are to be purchased together compared to what would be expected if their occurrence were independent.
Lift(X-> Y) = Support (XuY) / (SupportX * Support Y)

If the lift is equal to 1, it suggests that the occurrence of X and Y is independent of each other.
If the lift is greater than 1, it indicates a positive dependence or a complementary effect, suggesting that the presence of X increases the likelihood of the presence of Y (and vice versa).
If the lift is less than 1, it indicates a negative dependence or a substitution effect, suggesting that the presence of X decreases the likelihood of the presence of Y (and vice versa).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To do customer journey analysis, one can use

association rules.
sequence rules.

A

sequence rules.
while association rules are valuable for identifying item associations within transactions, sequence rules are more suitable for customer journey analysis, providing insights into the order and dependencies between events or touchpoints over time

17
Q

When mining association rules, there is

a target variable available.
no target variable available.

A

no target variable available.
In summary, association rule mining is typically used in scenarios where the focus is on uncovering interesting relationships and patterns between items or events in a dataset, and there is no specific target variable to predict or classify.

18
Q

In descriptive analytics, there is

a target variable.
no target variable.

A

no target variable.

predictive analytics involves building models to make predictions or classifications on future or unseen data, and it does typically involve a target variable. Descriptive analytics, on the other hand, focuses on summarizing and understanding existing data without the need for predicting future outcomes.

19
Q

Given the following transactions database:

Listener Artists
1 Tsjaikovski, Eminem, Måneskin, The Weeknd
2 Ariana Grande, Olivia Rodrigo, Beyoncé
3 Beyoncé, Ed Sheeran, Måneskin, Ariana Grande
4 Ed Sheeran, Beyoncé, Ariana Grande
5 Ed Sheeran, Måneskin
6 Eminem, The Weeknd, Måneskin
7 The Weeknd, Olivia Rodrigo
8 Måneskin, Ariana Grande, Eminem, The Weeknd
9 Ariana Grande, Ed Sheeran, Olivia Rodrigo

The association rule: Ariana Grande à Ed Sheeran has:

a support of 3/9 and a confidence of 3/5.
a support of 3/9 and a confidence of 4/5.
a support of 3/10 and a confidence of 3/5.
a support of 3/10 and a confidence of 4/5.

A

a support of 3/9 and a confidence of 3/5.

20
Q

Which statement is CORRECT?

Agglomerative clustering starts from all observations in their own cluster whereas divisive clustering starts from all observations in one cluster.

Divisive clustering starts from all observations in their own cluster whereas agglomerative clustering starts from all observations in one cluster.

Both agglomerative and divisive clustering start from all observations in their own cluster.

Both agglomerative and divisive clustering start from all observations in one cluster.

A

Agglomerative clustering starts from all observations in their own cluster whereas divisive clustering starts from all observations in one cluster. -> correct

-> Divisive clustering starts with all observations in one cluster and then recursively divides it into smaller clusters. Agglomerative clustering, on the other hand, starts with each observation in its own cluster and then merges clusters iteratively.
-> Agglomerative clustering starts with each observation in its own cluster, not in their own cluster. Divisive clustering starts from all observations in one cluster.
-> Agglomerative clustering starts from all observations in their own cluster (each observation as a separate cluster) and then merges them, while divisive clustering starts from all observations in one cluster and then recursively divides it.

21
Q

When doing k-means clustering, the cluster centroids

change during the clustering.
always remain fixed.

A

change

k-means clustering
select k observations as initial cluster centroids
assign each observation to cluster that has closest centroid
when all observations assigned, recalculate positions of k centroids
repeat until cluster centroids no longer change