Descriptive Data Mining Flashcards

1
Q

Antecedent

A

The item set corresponding to the if portion of an if—then association rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Association rule

A

An if—then statement describing the relationship between item sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Centroid linkage

A

Uses the averaging concept of cluster centroids to define between-cluster similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Complete linkage

A

Measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations between the two clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confidence

A

The conditional probability that the consequent of an association rule occurs given the antecedent occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consequent

A

The item set corresponding to the then portion of an if—then association rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dendrogram

A

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dimension reduction

A

Process of reducing the number of variables to consider in a data-mining approach.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Euclidean distance

A

Geometric measure of dissimilarity between observations based on the Pythagorean theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Group average linkage

A

Measure of calculating dissimilarity between clusters by considering the distance between each pair of observations between two clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hierarchical clustering

A

Process of agglomerating observations into a series of nested groups based on a measure of similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Jaccard’s coefficient

A

Measure of similarity between observations consisting solely of binary categorical variables that considers only matches of nonzero entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

k-means clustering

A

Process of organizing observations into one of k groups based on a measure of similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Lift ratio

A

The ratio of the confidence of an association rule to the benchmark confidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

market basket analysis

A

Analysis of items frequently co-occuring in transactions (such as purchases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

matching coefficient

A

Measure of similarity between observations based on the number of matching values of categorical variables.

17
Q

McQuitty’s method

A

Measure that computes the dissimilarity between a cluster AB (formed by merging clusters A and B) and a cluster C by averaging the distance between A and C and the distance between B and C.

18
Q

Median linkage

A

Method that computes the similarity between two clusters as the median of the similarities between each pair of observations in the two clusters.

19
Q

Missing at random

A

The case when data for a variable is missing due to a relationship a relationship between other variables.

20
Q

Missing completely at random

A

The case when data for a variable is missing purely due to random chance.

21
Q

Missing not at random

A

The case when data for a variable is missing due to its unrecorded value.

22
Q

Observation

A

A set of observed values of variables associated with a single entity, often displayed as a row in a spreadsheet or database.

23
Q

Single linkage

A

Measure of calculating dissimilarity between clusters by considering only the two most similar observations between the two clusters.

24
Q

Support count

A

The number of times that a collection of items occurs together in a transaction data set.

25
Q

Unsupervised learning

A

Category of data-mining techniques in which an algorithm explains relationships without an outcome variable to guide the process.

26
Q

Ward’s method

A

procedure that partitions observations in a manner to obtain clusters with the least amount of information loss due to the aggregation