08_neural networks Flashcards

1
Q

How do kNN, linear models and tree-based models really learn?

A

not iteratively

knn: computed distances and compares distribution of unseen data points with distribution of seen data points

linear: fitted to seen data based on the task

tree-based: identify and memorize patterns relevant to the task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

With which three components does the human brain work?

A

Neurons (nerve cells)

Dendrites (connects neurons)

Axons (long-distance connections)

–> neurons are inter-connected forming a dense network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is information passed through neurons in the human brain?

A

through electrical signals

connected neurons absorb the incoming signals and process them. some of them will fire, but not all.

–> cascade of signals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we need for neural networks to represent the deep cascade of the layers of neurons in a human brain?

A

input data, which is processed in its hidden layers, and generates output data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a fully connected network?

A

a neural network where each neuron is connected to all neurons in the previous layers and all the neurons in the following layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can a fully connected neural network be characterized?

A
  • number of layers (depth)
  • number of neurons in each layer
  • number of input variables (= number of neurons in the first layer)
  • number of output variables (= number of neurons in the final layer)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does a fully connected neural network work?

A
  • vectorial input data provided to the network, one value per neuron in the input layer
  • all inputs are seen by each neuron in the underlying layer
  • each neuron will process the incoming info, firing (1) under some conditions and (0) otherwise
  • repeat
  • output is generated in the final layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does a neural network act in general terms?

A

acts as a function approximator

  • any mathematical function can be approcimated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can we implement artificial neural networks to learn specific tasks?

A

through connectionism, everything is connected with everything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are two problems we have to solve before we can implement artificial neural networks?

A

1) how to implement neurons?

2) how to train the network?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does a general neuron work?

A

number of inputs might differ from number of outputs - what is the function?

takes in a vector of values, processes them and returns a binary signal based on its learned behavior, which is then passed on to all neurons in the following layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is part of the function of a perceptron?

A

input variable x,
weight w
bias value b

–> if the resulting value is greater zero, perceptron neuron fires, otherwise not

step function is called activation function: introduces non-linearity into the output of the perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can a single perceptron be considered as?

A

a linear classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we train a perceptron?

A

perceptron learning rule, weights are adjusted by a step size that is called the LEARNING RATE

by iteratively running this algorithm over training data multiple times, weights can be learned so that the model perform properly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a major limitation of individual perceptrons?

A

inability to reproduce a logical exclusive-or (XOR) function!

  • bc are simply linear functions

multi-layer perceptrons concatenate layers of perceptrons, which makes them much more powerful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does MLP stand for?

A

multi-layer perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are MLPs?

A

simple feed-forward neural networks (info traverses graph in only one direction

  • fully-connected
  • can learn more complex relations from data than single perceptrons, each layer adds NON-LINEARITIES that increase the model’s capacity
  • modern MLPs utilize additional layers and other non-linear activation functions that support the learning process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the function behind a neuron?

A

x * w + b > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do artificial neurons compute?

A

dot-product between input vectors and learned weights
and produce an output signal that propagates through all deep layers

20
Q

What is a perceptron?

A

simple artificial neuron that produces a binary output

21
Q

What is a multi-layer perceptron?

A

an early fully-connected neural network

22
Q

What does an activation function do?

A

defines when a neuron “fires”

Non-linearity increases in the model’s capacity

23
Q

What is a simple step function?

A

g(x) = geschwungene Klammer entweder 1 if <condition> or 0 else</condition>

to define whether a neuron fires or not

24
Q

What are advantages and disadvantages of the step function?

A

+ simple to implement

+ computationally inexpensive

  • only binary (discrete) output
  • no gradient
25
Q

What is the sigmoid function?

A

o(x) = exp(x) / 1 + exp(x)

26
Q

What are advantages and disadvantages of the sigmoid function?

A

+ continuous non-linear function

+ gradient defined

  • asymmetric output value range [0, 1]
  • computationally expensive
27
Q

What is the tanh function?

A

tanh (x) = sinh (x) / cosh (x)

28
Q

What are advantages and disadvantages for the tanh function?

A

+ continuous non-linear function

+ gradient defined

  • symmetric output value range [-1, 1]
  • computationally expensive
29
Q

What is the ReLu function?

A

rectified linear unit function

ReLU(x) = geschwungene Klammer x if x>0; = else

30
Q

What are advantages of the ReLU function?

A

+ continuous non-linear function

+ gradient defined, and simple to compute

+ computationally inexpensive

31
Q

Why is it important for the activation function to be differentiable?

A

we need the gradient to be computable.
therefore, the step function is not a good choice as it has no gradient

32
Q

Why is the ReLU used most often?

A

Sigmoid, Tanh and ReLU roughly lead to similar results, but the ReLU is computationally the most efficient

33
Q

What should a good activation function be?

A

continuously differentiable

non-linear

computationally inexpensive

34
Q

What enables deep neural networks to learn complex tasks?

A

the non-linearity of activation functions

35
Q

What is the Least squares fitting in linear regression?

A

a convex optimization problem:
there is only one solution to the problem and it is per definition the best solution

36
Q

How do we modify the neural networks weights to reduce the loss?

A
  • random changes (possible but not very goal-oriented)
  • backpropagation (we check for every single weights how changing it would affect the loss)
37
Q

How can we modify each individual weight parameter?

A

based on computed gradients

wi = wi - alpha upsidedowntriangle wi

38
Q

What is a learning rate?

A

alpha (definition of step size for the modifications to the weights)

39
Q

What is stochastic gradient descent?

A

iterative process, depends on the random selection of mini-batches

following the gradients in the weight space to the lowest loss value

–> allows us to find the minimum of the loss in an iterative process

40
Q

What happens if we use a small learning rate?

A

it will take a long time to reach the global minimum, we could also possibly get stuck in a local minimum

41
Q

What happens if we use a large learning rate?

A

it is possible that we miss the global minimum,

also convergence is unlikely

42
Q

How do neural networks learn?

A

learn patterns from data to perform specific tasks

early layers extract low-level signals with spatial significance

later layers interpret these signals and provide semantic significance

–> end-to-end learning

43
Q

What does Stochastic gradient descent (SGD) do?

A

it uses the gradients computed with backpropagation to update network weight parameters iteratively to reduce the model’s loss

44
Q

What is key to a meaningful training process in neural networks?

A

ability to compute the gradient of the loss function
with respect to every single network weight parameter

this is achieved through a process called backpropagation

45
Q

What is the neural network training pipeline?

A

1) sample batch (input data x and target data y) from training dataset

1 epoch:
- evaluate model on batch input data (prediction) in forward pass

  • compute loss on prediction and target y
  • compute weight gradients with backprop.
  • modify weights based on gradients and learning rate
  • repeat for all batches

2) repeat for a number of epochs, monitor training and validation loss + metrics

3) stop before overfitting sets in

46
Q

What do you see in the curves of the training and the validation loss in well-trained neural network models?

A

If the validation loss sinks less fast than the training loss but still does not go up after some iterations, the model is well-trained