W05 Supervised Learning Flashcards

1
Q

Classification:

data basis

A

several independent attributes

one dependent attribute, the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification:

condition

A

a priori knowledge of classification for some instances (supervised learning!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classification:

model building

A

generate rules from classified instances

first: generate best fit
then: prune based on validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Classification:

generalization

A

apply rules to new instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification:

methods

A
logistic regression
naive bayes classifier
support vector machines
decision trees
random forest
neural networks
nearest neighbour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decision Tree Terminology:

Binary Tree

A

each node splits data at most in 2 sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision Tree Terminology:

Classification Tree

A

split can lead to >2 branches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Decision Tree Terminology:

Decision Tree

A

Nominal (categorical) Classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decision Tree Terminology:

Regression Tree

A

Cardinal Classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision Tree Terminology:

Input

A

Instance pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Decision Tree Terminology:

Output

A

Full Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decision Tree Terminology:

Objective

A

Formulate rules of type:

If condition 1-n, THEN condition n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Tree Terminology:

Rule

A

Path from root to leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Generating a decision tree algorithm

A
1 all objects in single node
2 search for best classification criterion
3 classify all objects accordingly
4 recusively apply 2+3 until STOP
5 prune tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classifcation algorithms variety

A
1 stop criteria
2 pruning strategy
3 choice of attributes as classification criterion
4 number of splits per node
5 scale of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(CH)AID

A

chi squared automatic interaction detection

->find significantly different subsets of data so select attributes to generate those

17
Q

CART

A

classification and regression trees

->maximize information content i so select attributes accordingly

18
Q

ID3

A

iterative dichotomizer 3

->minimize entropy so split on attribute producing that

19
Q

Entropy Formula

A

H(S) = - sum [pi*log2pi]

20
Q

Information Entropy formula

A

I(a) = sum[qi+H(S)]

21
Q

Decision Tree Pruning

A

simplify complicated decision trees to incease efficiency and avoid over-fitting

top-down pruning->stopping criteria when building trees

bottom-up-pruning
->ex post:
prune splits that not increase subset homogeneeity sufficiently
prune to undo over-fitting based on validation set: prune tree parts that not increase success quota

22
Q

Decision Tree Properties:

number of generated rules

A

number of leaves

23
Q

Decision Tree Properties

maximum rule length

A

depth of tree

24
Q

Decision Tree Properties

sum of all path lengths from root fo leaf

A

external path length;

determines memory requirements

25
Decision Tree Properties | sum of path lengths from root to leaf multiplied by number of represented instances
weighted external length; | measures classification costs
26
Decision Trees: understandability? relationships? too complex rules?
- high understandability and interpretability - non-linear relationships - pruning important
27
Random Forest
-several randomised instances of model - use aggreagated results for clasification 1 generate k trees by drawing with replacement k times 2 generalize by classifying with k trees and choose most frequently determined class
28
Gradient Boosted Trees
1 initialize a prediction model with constant vlaue 2 compute pseudo-residuals 3 extend model by creating a regression tree to predict pseudo residuals 4 apply and repeat 2 for M iterations
29
Support Vector Machines
built a linear discriminant function to separate two classes as widely as possible critical boundary instances are termed support vectors
30
Neural Networks
imitate concepts of brain connect several simple models in hierarchical strucure simple models are perceptrons, massively interconnected, decomposing problems and forwarding them backpropagation -> modify weights based on contribution to accurate solution