Book - Chapter 7 Analytical Theory Classification Flashcards

(36 cards)

1
Q

What applications does classification appear in

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary task of a classifier

A

To assign class labels to new observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Are classification method supervised or unsupervised

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is another name for a decision tree

A

Prediction tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the input variable of a decision tree

A

Categorical or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In a decision tree structure what is a test point

A

A node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a node without further branches called

A

A leaf node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do leaf nodes return

A

They return class labels and, in some implementations, they return the probability scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two varieties of decision trees

A

Classification trees and regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are classification trees

A

They usually apply to output variables that are categorical for example often binary yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are regression trees

A

They can apply to output variables that are numerical continuous, such as the predicted price of a consumer good or the likely heard a subscription will be purchased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the term branch mean in decision trees

A

Refers to the outcome of a decision and is visualised as a line connecting two Nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens if the decision is numerical

A

The greater than branch is usually placed on the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an internal node

A

Are the dissertation or test points. Each internal note refers to an input variable or an attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the top internal node called

A

The root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the depth of a node

A

Is the minimum number of steps required to reach the node from the root

17
Q

What are short trees also known as

A

Weak learners or base learners

18
Q

What’s on in ensemble Mefford

A

They use multiple predictive models to vote, and decisions can be made based on the combination of the votes

19
Q

Gave examples of ensemble methods

A

Random forest, bagging, and boasting

20
Q

What is the simplest short tree called

A

Decision stump

21
Q

At each split what does the decision tree algorithm do

A

It picks the most informative attribute out of the remaining attributes

22
Q

How is the most informative attribute determined

A

By measures such as entropy and information gain

23
Q

What does entropy measure

A

The impurity of an attribute

24
Q

What does information gain measure

A

The purity of an attribute

25
When do you achieve maximum entropy
When all class labels are equally probable
26
What is conditional entropy always
Less than or equal to the base Entropy
27
What is information gain defined as
The difference between base Entropy and conditional entropy
28
What is Bayes theorem
Gives a relationship between the probabilities of two events and their conditional probabilities
29
What is a naive Bayes classifier
Assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of other features
30
What are the input variables of naive Bayes
Categorical and I’ll discreet
31
What is the output of naive Bayes
Class label and its corresponding probability score. The probability score is not the true probability of the class label, but it’s proportional to the true probability
32
What is naive Bayes most commonly used for
Spam filtering
33
What is Bayes theorem
The conditional probability of event C occurring, given that event A has already occurred, is to noted as P (C|A)
34
What should a good classifier have
A large true positive and true negative and a small (ideally zero) numbers for false positives and false negatives
35
What does accuracy mean
Defining the rate at which a model has classified the records correctly
36
What is recall
The percentage of positive instances that were correctly identified