Chapter 4 Flashcards

(80 cards)

1
Q

What’s the most common splitting criterion?

A

information gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What’s the role of Decision Trees?

A

Create a formula/algorithm that evaluates how well each attribute splits a set of example into segments, with respect to a chosen target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

To what does disorder correspond to?

A

to how mixed (impure) the segment is with respec to values of attribute of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Formula of Entropy

A

-p1 log(p1) – p2 log (p2) ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Pi

A

probability of value i within the set (relative percentage/share)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is Pi = 1?

A

when all members of set have attribute i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is Pi = 0?

A

when no members of the set have the attributte i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the parent set?

A

the original set of examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does an attribute do?

A

It segments a set of instances into several k subsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are K children sets?

A

The result of splitting on the attribute values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Information gain measure?

A
  • how much an attirbute improves (decreases) entropy
  • change in entropy due to new info added
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula IG(parent)

A

IG(parent) = Entropy(parent) – p(c1) entropy(c1) – p(c2) entropy(c2) ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formula Entropy (HS = square)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formula Entropy (HS = cricle)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Formula IG = entropy (Write-off)..

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What reduces entropy substantially?

A

splitting parents data set by body shape attribute

  • select attribute that reduces entropy the most
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you find the best attribute to partition the sets?

A

recursively apply attribute selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Disadvantages of ID3

A
  • tends to prefer splits that result in larg numbers of partitions, small but pure
  • overfitting, less generalization capacity
  • cannot handle numeric values, missing values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

List ANN (artificial nerual networks)

A
  • neurons
  • nucleus
  • dendrite
  • axon
  • synapse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define neurons

A

cells (processing elements) of a biological or artifical neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define the nucleus

A

the central processing portion of a neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define the dendrite

A

the part of a biological neuron tha tprovides inputs to the cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define the axon

A

an outgoing connection (i.e., terminal) from a biological neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define synapse

A

the connection (where the weights are) between processing elements in a neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Define Learning
- an establishment of interneuron connections - classical conditioning
25
What is ANN?
computer technology that attempts to build computers that will operate like human brains - machine process simultaneous memory storage and work with ambiguous info
26
What is a single perceptron?
early neural network structure that uses no hidden layer
27
What is the input of ANN
consists of the output of the sending unit and the wight between the sending and receiving units
28
What are connection weights of ANN associated with?
with each link in a neural network model
29
What do connection weights of ANN express?
the relative strenght of the input data
30
By what are connection weights of ANN assesed?
neural networks learning algorithms
31
What does the Propagation (summation) function determine?
how the new input is computed
32
What type of combination is used in the propatagion (summation) function?
linear
33
Formula netinput i
34
What does the activation function do?
computes the internal stimulation (activity level) of the neuron - neuron may or may not produce an output (fire)
35
What else is the activation function called?
- transformation function - transfer function
36
What's the range of human hearing?
20 Hz to 20 kHz
37
Output ANN
- sometimes a threshhold function is used - most software packages do not distinguish between activation level and output function
38
How is learning done in ANN?
by comparing computer (predicted) outputs to desired (true target values) outputs of historical cases
39
Define learning in ANN
a change of weights between units
40
Describe the three tasks of the process of learning in ANN
1. compute temporary outputs 2. compare outputs with desired targets 3. adjust the weights and repeat the process
41
What is the Delta rule?
a special form of steepest gradient descent approach
42
What is the Delta rule also called?
- Widrow-Howw rule - Least Mean Square rule
43
Linear separability:what does a single neuron represent?
a hyperplane in instance
44
Linear separability: What can be represented using a perceptron?
Three operations AND OR NOT
45
Linear separability: what is needed?
multilayer perceptron
46
Into what can any expression from propositional calculus be converted?
a multilayer perceptron
47
Multilayer perceptrons: Topologies
the type how neurons are organized in a neural network
48
Multilayer perceptrons: How many layers does the network structure have?
3 1. Input Layer 2. Hidden Layers 3. Output layer
49
Describe the Input layer of the Multilayer perceptrons
- each input corresponds to a single attribute - several types of data cand be sued - preprocessing may be needed to convert the data into meaningful inputs
50
Describe the hidden layers of the Multilayer perceptrons
- the middle layer of an artificial neural network - has three or more layers - each layer increases the training effort exponentially
51
Describe the output layer of the Multilayer perceptrons
- contains solution to a problem - the purpose of the network is to compute the output values
52
Flow diagram of the development process of an ANN
1. Collect Data 2. Separate into training & testing set 3. Define a network structure 4. Select a learning algorithm 5. Set parameters and values, initialize weights 6. Transform data into network outputs 7. Start training and determine and revise weights 8. Stop and test 9. Implementation: use the network with new cases
53
How can the relationship between the internal activation level and the output be?
- linear - nonlinear
54
What are the types of learning?
- supervised - unsupervised - reinforced - direct design methods (hardwired systems)
55
What are the times of learning?
incremental training vs batch training
56
What are the learning rules in ANN
- Delta rule - Gradient descent - Backpropagation - Hebbian rule - Competitive learning
57
To which type of ANN does the delta rule apply?
without hidden layers
58
For what are ANN with hidden layers needed?
some problems, like training an XOR classifier
59
Define Backpropagation
- the error (similar to data rule) is propagated back also possible: the calculation of the weight changes for hidden layers
60
List the steps of Backpropagation
1. Initialize weights with random values and set other parameters 2. Read the input vector and the desired output 3. Compute the actual output via the calculations, working forward through the layers (forward pass) 4. Compute the error 5. Change the weights by working backward from the output layer through the hidden layer (backward-pass?
61
What is the forward pass?
computing the actual output via the calculations, working forward through the layers
62
What is the backward pass?
changing the weights by working backward from the output layer through the hidden layers
63
Define the gradient descent
find combination of all weights w, so that the sum of the squared errors F is minimized
64
Gradient Descent: Porblem
high computational complexity
65
Gradient Descent: Solution
sleepest gradient descend method - the negative gradient gives the direction where to move in next iteration
66
Gradient Descent: Premise for usage
- differentiable propagation - activation - output functions
67
Gradient Descent: Workaround for limitations
change: - initial weights - starting point of the gradient approach - type of initialization - learning parameters define different learning rates for different layers insert momentum (inertia) parameter apply decay parameter
68
How do we change learning parameters as a workaround for limitations in gradient descent?
- increase learning rate - decreate learning rate - vary learning rates
69
What is A Self-Organizing Map?
a smart map that takes compelx information and organizes it neatly
70
How does a Self-Organizing Map organize information neatly?
by placing similar things close to each other on the map
71
How does a Self-Organizing Map adjust its map?
so that it can recognize and regroup similar patterns in data
72
another name of Self-Organizing Maps
Kohonen's self organizing maps (SOM)
73
What are Hopfield networks?
smart memory systems that can remember and recall patterns
74
How do Hopfield networks work?
- they connect all their "brain cells" together - when they learn something, the connections get adjusted
75
What do Hopfield networks do when you give them a partial or noisy pattern?
they can fill in the blanks and remember the closest thing they learned
76
What are Hopfield networks used for?
- remembering faces - solving certain types of problems
77
Advantages of ANN
- able to deal with highly nonlinear relationships - not prone to restricting normality and/or independence assumptions - can handle variety of problem types - proves better results compared to its statistical counterparts - handles both numerical and categorical variables (transformation needed)
78
What are the limitations of ANN
- black-box solutions lacking explainability - hard to find optimal values for large number of network parameters - optimal design is hard to achieve - a large number of variables is hard to handle - training may take a long time for large datasets
79
What required longer training for large datasets for ANN?
case sampling