Final Flashcards

1
Q

Given an input dataset and the Apriori algorithm, how to trace the algorithm for intermediate results? (Review)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to derive strong rules from the given frequent itemsets L and a conf_rate?

A

Test for strong rules by filtering the rules with conf < min_conf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to improve the efficiency of the rule generation procedure by applying the apriori property?

A

Pruning while generating rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two general purposes of DM, use some examples of mined association patterns to explain for each purpose?

A
  • Find frequent itemsets
    • find all frequent itemsets from D
  • Generate association rules
    • Dervice rules from each frequent itemset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can the association mining process be mapped to the empirical cycle model of scientific research?

A
  • Obersevation: observe all the data
  • Analysis: generating all associations
  • Theory: apply apriori knowledge (support / confidence)
  • Prediction: predicting X given apriori knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why classification mining is a supervised learning process? How about association mining?

A
  • Partitioning training data based on divide-and-conquer strategy.
  • It trains on a portion of the data and then tests on the other portion. If the accuracy is acceptable it can be used to predict.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the major phases of conducting a classification mining application?

A
  • Training
    • Each tuple is assumed to belong to a predefined class, as determined by the class label attribute
    • The set of tuples used for model construction is training set
  • Testing
    • The known label of test sample is compared with the classified result from the model
    • Accuracy rate is the percentage of test set samples that are correctly classified by the model
  • Predicting
    • If the accuracy is acceptable, use the model to classify unseen data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you describe a mapping between a classification application process and the empirical cycle?

A
  • Analysis –> Classification algorithm
  • Theory –> Classification Model
  • Prediction –> Testing & Prediction
  • Observation –> Training Data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the general idea/strategy/method/alorithm of DT induction for classification mining?

A
  • Supervised learning
    • Derive a model form a training data set
  • Inductive learning process
    • Contructing a tree using to-down, divide-and-conquer strategy
    • Testing -> Choose -> Split
  • Tree constuction/induction by greedy search
    • Depth-first search
    • Heuristic function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the general strategy of Inductive Learning (via observing examples)?

A
  • Divide and conquer strategy
  • Continue dividing D into subsets, based a search method, until each subset has only one label, i.e. all examples in the subset share a same class label.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the major technical issues of DT Induction approach for classification mining?

A
  • Preparing datasets: (training & testing)
    • A training dataset for learning a model
    • A test dataset for evaluating the learned model
  • Classification model discovery: (constructing a DT)
    • Stopping criteria for testing at each node
    • How to choose which attribute to split, and how to split (method)
    • Control structure for tree construction (recursive process)
    • Pruning method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the heuristic function used in ID3 algorithm for evaluating search directions?

A

Entropy Calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the notion of Information Gain, and how it is applied in ID3 algorithm?

A
  • Expected reduction in entropy
    • Define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X
  • ID3 uses information gain to select among the candidate attributes at each step while growing the tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to convert the ID3 algorithm into an implementation code structure?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to quantify information contained in a message?

A

Q(message) = P2(outcome_after) - P1(outcome_before)​

Q, the quantity of information contained in a message.

P1, the probability of outcome before receiving a message.

P2, the probability of outcome after receiving the message.

  • Information in general always associates with a question and a message for answering the question
  • Information measure is to quantify the outcome of some expectation from the message for answering the question
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Suppose a missing cow has strayed into a pasture represented as an
8 x 8 array of “cells”.

Question: Where is the cow?
Outcome: the probability of findng the cow.
Answer 1: Nobody knows.
Answer 2: The cow is in cell (4, 7).

What is the information received?

A

Outcome1 (cow before) = 1/64

Outcome2 (cow after = 1

Information received = log2 P2 - log2 P1

= log2 (P2/P1)

= log2 (1 / (1/64)

= log2 (64)

= 6 bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the message and information received formulas?

A
  • Q(message) = P2(after) - P1(before)
  • Information received = log2 P2 - log2 P1
    • log2(P2/P1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How the concept of 1 can be applied to a classification method, such as ID3 algorithm?

A

[????]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is entropy and information gain? How to use information gain for choosing an attribute?

A
  • Purpose is to select the best attribute with the highest information gain
  • Entropy measures information required to classify any arbitrary tuple
  • Information gain is used to dtermine the best split
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is ID3’s induction bias?

A

There is a natural bias in the information gain measure that favors attributes with many distinct values over those with few distinct values

21
Q

What is over fitting?

A

Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.

22
Q

What are the technical options of overcoming the bias problem for possible improvement?

A
  • Use the gain ratio instead (The measure Gain Ratio has been used in C4.5 algorithm)
    *
23
Q

How different a classification task is done by DT induction and by Naïve Bayes classifier? (*Give 3 differences.)

A
  • NB uses a probable model vs. inductive model
  • Use of bayes theorem weighting the evidence
  • Calculate prior probabilities

[Need review]

24
Q

What are the two assumptions for using NB classifier?

A
  • The quantities of interest are governed by the distribution of prior probabilities in that optimal decisions can be made by reasoning about these probabilities together with observed data
  • Attributes are independant of each other
25
Q

Why Naïve Bayes algorithm is more suitable to high dimensional data?

A

Because it makes a very strong assumption.

26
Q

What is text classification? What is the basic idea to convert unstructured text data into structured for classification?

A

Consider a classification problem in which the instances are text documents:

  • Spam filtering
  • Text document categorization
  • Classify survey comments

Make text data be structured:

  • Treat each text document as an object
  • Treat word postion as attribute
  • Treat words as domain values
27
Q

How to estimate the number of prior probabilities which need to be calculated for text classification?

A

Estimation = C * I * K

Target value (C)

Text positions (I)

Distrinct words (K)

- OR -

Estimation = C * K

28
Q

Why Naïve Bayes algorithm is more suitable for text data mining? What is its limitation?

A

Pros

  • Very good in domains with many equally important features
  • A good dependable baseline for text classification

Cons

  • The independence assumption states that the word probability contributes to the text classification has not association with its location in the text (i.e. words are independent each other)
  • This assumption is clearly inaccurate. In practice, however, the naive Bayes algorithm performs remarkably well when classify text by topic despite the incorrectness of this independence assumption.
29
Q

What are the main differences between topic oriented and sentiment oriented text classification?

A
  • Topic oriented relies on word distribution and the assumption of independant contribution of words to the classification evidence
  • Sentiment analysis requires knoweldge of individual word meaning and the relationships between the words.
30
Q

Based on what principle all classification methods can be commonly divided into two general categories (name each)? Provide two example classification algorithms to illustrate each category.

A
  • Model based classification
    • DT and ANN
  • Instance based classification
    • Naive-Bayes and k-nearest neigbor
31
Q

CommentonANNclassificationapproachinterms of its principle and trade-offs.

A
  • Borrows from the human brain: neurones and synapses
  • Has an input, hidden layers and output
  • Weights are used and a refined using training data

Tradoffs

  • Expensive
  • Unpredictable
32
Q

What are the main differences between Classification and Clustering DM (list 3 from different perspectives)?

A
  • Clustering doesn’t not use training data
  • There is not predefined class, rather you discover the classes
  • Clustered mining is less precise (by design) - looks at the big picture
  • Gain insights with hidden data concepts and data distribution
33
Q

Provide two application examples of clustering DM, explain how the DM result may be used for supporting business decision making.

A
  • Data clustering for customer group analysis
  • Generated concepts for supporting information retrieval (clustering content for search engines)
34
Q

What are the general criteria for judging quality of clustering DM results?

A
  • high intra-class similarity
  • low inter-class similarity
  • scalability
  • deal with different types of attributes
  • deal with noise and outliers
  • high dimentionality
  • interpretability and usability
35
Q

What is the basic idea to convert non-numerical data into numerical ones and how to prepare your data with various attribute types for clustering?

A

You need to use calculate the Euclidean distance between objects therefore they need to be quantified. You prepare you data using the following:

  • Binary values {symmetric, asymmetric}
  • Norminal variables (e.g. {red, green, blue}
  • Ordinal variables (e.g. medals)
36
Q

What are the basic data structures for clustering mining?

A
  • Data matrix
  • Dissimilarity matrix
37
Q

What are basic data structures for clustering mining?

A
  • Data matrix
  • Dissimilarity matrix
38
Q

How to calculate dissimilarity of object pairs for a dataset with mixed attribute types for clustering?

A

Euclidean distance

Euclidean distance, let x1 = (1,2) and x2 = (3,5) represent two objects in the data matrix figure above.

sqrt(22 + 32) = 3.61

39
Q

What are the main categories of clustering mining methods?

A
  • Partition-based clustering
  • Hierarchical clustering
40
Q

How to trace K-means algorithm on a given dataset?

Given

1) a 1D data set: {2, 4, 10, 12, 3, 20, 30, 11, 25}
2) K = 2.

A
41
Q

Explain why clustering DM is said to discovery concepts hidden in large datasets?

A

Is a provides a high-level view of dataset, that is, see the bigger picture. Clustering doesn’t use a a predetermined class but can be used to find a class.

[Review]

42
Q

What are outliers? How to handle them in applications?

A
  • Outliers are the data points with the values much different from those of the remaining set of data
  • Handle them as noise or as targets (through algorithms)
43
Q

What are the main differences between the two partition based methods: k-means and k-medoids?

A
  • Instead of taking mean values of the objects in a cluster as a reference point - medoid - i.e. the most centrally located object is used in a cluster.
  • A medoids average dissimilarity to all objects is the cluster is minimal. i.e. it is most centrally located point in the cluster
  • K-medoids handles outliers better than k-means
  • K-medoids is efficient on small data sets
44
Q

What are the main strength and limitation of k-medoids algorithm?

A
  • Strenghts
    • Efficient on small data sets
    • Handles outliers better than k-means
  • Limitations
    • Inefficient on large data sets
45
Q

Use a small dataset to trace each algorithm.

A

See Wikipedia article.

46
Q

What are the differences between partition based clustering and hierarchical clustering?

A

Clusters are formed in levels - actually creating sets of clusters at each level

47
Q

How are the mined clusters stored in a dendrogram data structure in a hierarchical clustering result?

A
  • Root is one cluster
  • Leaf is individual cluster
  • A cluster at level i is the union of its children clusters at level i+1
48
Q

What kind of constraints may be applied to cluster analysis?

A
  • A single target concepts (i.e. single dimension clustering) such as salary
  • A selected set of attributes such as salary, female, age (20-39)
  • All attributes
49
Q
A