apriori principle Flashcards
Which type of learning does the Apriori algorithm fall under?
A) Supervised Learning
B) Unsupervised Learning
C) Reinforcement Learning
D) Semi-Supervised Learning
B – Apriori is an unsupervised learning technique used for pattern discovery.
The Apriori algorithm is mainly used for:
A) Classification
B) Regression
C) Clustering
D) Market Basket Analysis
D – Its most common use is in market basket analysis.
Which of the following statements best describes the Apriori principle?
A) All subsets of a frequent itemset must also be frequent
B) All supersets of an infrequent itemset must also be infrequent
C) An itemset is frequent if it appears more than once
D) A frequent itemset must always have three or more items
A – Apriori principle: if an itemset is frequent, all of its subsets must be frequent.
What is the purpose of the support metric in Apriori?
A) To measure the confidence of a rule
B) To measure the frequency of an itemset in the dataset
C) To measure the strength of correlation
D) To identify the most expensive items in a basket
B – Support measures how often a particular itemset appears in the dataset.
If an itemset {A, B, C} is frequent, which of the following must also be frequent?
A) {A, B, C, D}
B) {A, D}
C) {B, C}
D) {B, D, E}
C – All subsets of a frequent itemset must also be frequent (Apriori principle).
Which metric is used to evaluate the usefulness of an association rule beyond chance?
A) Support
B) Confidence
C) Lift
D) Leverage
C – Lift indicates the strength of a rule beyond random chance.
If the minimum support is set too high, what is likely to happen?
A) Too many rules are generated
B) All itemsets become frequent
C) Rare but interesting patterns may be missed
D) The algorithm will not terminate
C – High support thresholds can exclude rare but potentially useful patterns.
In Apriori, what happens at each new iteration (k)?
A) We prune the previous frequent k-itemsets
B) We combine (k-1)-itemsets to generate k-itemsets
C) We calculate accuracy of each rule
D) We increase the confidence threshold
B – At each step, frequent itemsets of size (k-1) are used to generate candidates of size k.
Which of the following association rules is invalid, assuming {1, 2, 3, 4} is a frequent itemset?
A) 1 → 2, 3, 4
B) 1 → 3, 4
C) 1 → 1, 2, 3, 4
D) 2 → 3, 4
C – A rule cannot have the same item on both sides of the implication.
What is the primary limitation of the Apriori algorithm?
A) It doesn’t support rule generation
B) It requires labeled data
C) It is computationally expensive due to multiple scans of the dataset
D) It only works with numeric data
C – Apriori is slow due to multiple passes over the data and candidate generation.
What is the key principle behind the Apriori algorithm when generating frequent itemsets?
A) It considers all possible item combinations regardless of frequency
B) It uses previously found frequent itemsets to generate larger ones
C) It only analyzes single-item transactions
D) It ignores the concept of minimum support
B) It uses previously found frequent itemsets to generate larger ones
Explanation: The Apriori principle states that if an itemset is frequent, all of its subsets must also be frequent. The algorithm expands frequent (k-1)-itemsets to generate k-itemsets
In the Apriori algorithm, why are itemsets that do not meet the minimum support threshold eliminated early?
A) To reduce computational complexity
B) Because they have high confidence
C) Because they are likely to form stronger rules
D) To maximize lift values
Answer: A) To reduce computational complexity
Explanation: Apriori prunes itemsets that don’t meet the minimum support to avoid unnecessary computation, since supersets of infrequent itemsets cannot be frequent
Which of the following best defines “support” in the context of association rules?
A) The ratio of antecedent occurrences to consequent occurrences
B) The number of transactions containing both antecedent and consequent
C) The probability that the consequent occurs given the antecedent
D) The efficiency of the rule compared to random chance
B) The number of transactions containing both antecedent and consequent
Explanation: Support measures how frequently an itemset appears in the dataset. It’s the proportion (or count) of transactions containing both the antecedent and consequent
If an itemset {A, B} is not frequent, what does the Apriori principle suggest about any itemset containing {A, B}?
A) It might still be frequent
B) It will have higher confidence
C) It cannot be frequent
D) It will have a higher lift
C) It cannot be frequent
Explanation: According to the Apriori principle, if an itemset is infrequent, all its supersets are guaranteed to be infrequent
Which metric is used to evaluate how much better a rule is at predicting the consequent than random chance?
A) Support
B) Confidence
C) Coverage
D) Lift
Answer: D) Lift
Explanation: Lift measures how much more often the antecedent and consequent occur together than expected if they were independent. A lift > 1 indicates a positive association
Given the rule: {Milk} ⇒ {Cookies}, with a confidence of 80% and a lift of 1.2, what does the lift value indicate?
A) Milk and Cookies are purchased together purely by chance
B) There is no association between Milk and Cookies
C) The rule is 20% better at predicting cookie purchases than random chance
D) The rule applies to 80% of transactions
Answer: C) The rule is 20% better at predicting cookie purchases than random chance
Explanation: A lift of 1.2 means the likelihood of buying cookies when milk is purchased is 1.2 times higher than random chance
What is the primary computational challenge in the Apriori algorithm?
A) Calculating lift for each rule
B) Generating association rules from frequent itemsets
C) Scanning the database to find frequent itemsets
D) Setting the correct minimum confidence threshold
Answer: C) Scanning the database to find frequent itemsets
Explanation: The Apriori algorithm requires multiple passes over the dataset to find frequent itemsets, making it computationally expensive, especially with large datasets
In R, using the arules package, what does the Apriori algorithm generate by default?
A) Rules with multiple items in the consequent
B) Rules sorted by confidence
C) Rules with one item as the consequent (RHS)
D) Only itemsets without generating rules
C) Rules with one item as the consequent (RHS)
Explanation: By default, the Apriori algorithm in arules generates association rules where the consequent (RHS) is a single item, as multi-item consequents are less interpretable and rarer