Chapter 7 Flashcards

Question 1

Q

what is the goal in individual level models?

what is the goal in population level models?

Answer

A

individual level:

> generate model that predicts unseen data of that individual well

population level:

predict unseen user
predict unseen data of known user

Question 2

Q

what is the limit of a perceptron?

Answer

A

it can ony represent linearly separable cases for classification

> classes need to be separable using a hyperplane in the p-dimensional input space

Question 3

Q

multilayer perceptron: can we always find a combination of hyperplanes that completely separate the training data without errors?

> why/why not?

Answer

A

yes, with one requirement:

> there may not be any input vectors that are identical but have different labels

Question 4

Q

how do convolutional neural networks work?

Answer

A

like multi layer neural networks but additional layers preceding the conventional layers, which identify features

two types of layers:

convolutional layers

> contain filters that extract features from receptive field

pooling layers

> summarize values in a certain region of space and represent output as single neuron

Question 5

Q

difference in class separation between neural nets and SVM?

Answer

A

neural nets: aim to find hyperplane that separates classes

SVM: aim to find a hyperplane that maximizes the distance between two classes

Question 6

Q

what are support vectors?

Answer

A

SVM describes 3 hyperplanes:

hyperplane that maximizes the distance between the classes

this hyperplane is moved both directions towards both classes until the first point of the classes lie on that hyperplane

> the points positioned on those hyperplanes are called support vectors

Question 7

Q

how does SVM handle the linear separability problem?

Answer

A

using kernel functions:

> map inputs to a higher dimensional feature space

>>> problem is linearly separable

Question 8

Q

why is k-nearest neighbour also called the lazy learner?

Answer

A

KNN only starts computing when it encounters a new case

Question 9

Q

what are the general steps in KNN?

Answer

A

KNN:

for the new datapoint, consider the k closest points
assign a target value based on the target value of the k neighbours

> classification: majority class

> regression: average

done

Question 10

Q

what is the basic idea of distance weighted nearest neighbour?

Answer

A

KNN does not consider the distance of the k neighbours

> intuitively we want to weigh closer points as more important as they are more similar

Question 11

Q

what are the general steps of decision trees?

Answer

A

decision tree:

start with empty tree
select most important attribute, and create node
create branches for each possible value of that attribute

(for each branch a new decision tree is formed based on the subset of the training set that contains the associated attribute-value combination)

recursively repeat until stop condition is met

Question 12

Q

decision trees: explain entropy

Answer

A

entropy:

> the amount of information we need to communicate to describe a seriesi.e.

> if all instances are the same, we only need to send minimal information >>> entropy 0

> if instances are evenly distributed between classes, we need to send maximum information (half of the instances) in order to describe the whole set >>> entropy 1

Question 13

Q

how to make split decisions in decision trees?

Answer

A

the goal: leaves that cover a set of instances of the same class (low entropy)

we start with whole dataset, which has high entropy
we want to select attributes for our nodes that split the data into subsets with the lowest possible entropy
decide bases on information gain:

> compare original entropy before split with weighted entropy of the subsets after split

Question 14

Q

NB: what is the prior probability?

Answer

A

prior probability:

> probability of observing class g within dataset

> easy: number of observations of class g divided by N

Question 15

Q

what is the naive assumption of naive bayes?

Answer

A

NB: in order to calculate the probability of observing x given class g we multiply the conditional probabilities of individual attributes

> this assumes conditional independence between attributes

> does not always hold, as attributes might be correlated

Question 16

Q

whats an advantage of NB?

Answer

Study These Flashcards

A

advantage:

> missing values can simply be ignored because their class conditional probability gives a non-zero value for cases with no observations

Question 17

Q

what are two main causes for weak prediction performance?

solution?

Answer

Study These Flashcards

A

expressive power of the models might be insufficient
training data is too limited

>>> solution: ensembles!

Question 18

Q

what are the two main ensemble methods?

Answer

Study These Flashcards

A

bagging: aim at reducing variance
boosting: aim at reducing bias

Question 19

Q

explain how bagging works

Answer

Study These Flashcards

A

bagging: bootstrap aggregation

> draw multiple samples from the dataset (with replacement)

> generate a model for each sample

> aggregate output over all models, e.g. majority vote (mean in case of regression)

Question 20

Q

what does bagging avoid?

Answer

Study These Flashcards

A

overfitting

Question 21

Q

explain the major steps of boosting

Answer

Study These Flashcards

A

boosting: iteratively create models that focus on areas where mistakes are being made
1. build initial model on whole dataset
2. evaluate performance and form a new training set X2 which weights cases where the initial model made mistakes more heavily
3. repeat m times
4. now we have m models, from which we can aggregate our predictions

Question 22

Q

data stream mining: two different solutions

Answer

Study These Flashcards

A

data based solutions

> building models on a subset of the full dataset

task based solutions

> focus on changing the algorithm to make it more efficient

Question 23

Q

name 3 feature selection approaches

Answer

Study These Flashcards

A

pearsons correlation
forward selection

> start with empty set of attributes and iteratively add the attribute that increases performance most

backward selection

> start with set of all attributes and iteratively remove the attribute that decreases performance least

Chapter 7 Flashcards

(23 cards)