shallow neural network Flashcards

(27 cards)

1
Q

What defines a shallow neural network?

A

A neural network with only one hidden layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the main advantage of adding a hidden layer to a model?

A

It allows the model to learn nonlinear functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the activation function in a neuron do?

A

Applies nonlinearity to the linear combination of inputs and weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the most commonly used activation function in this lecture?

A

ReLU (Rectified Linear Unit).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for the ReLU activation function?

A

φ(x) = max(0, x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is ReLU preferred over sigmoid or tanh in hidden layers?

A

It avoids vanishing gradients and is computationally efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does each hidden unit in a shallow network compute?

A

A linear transformation followed by a nonlinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the full expression for a shallow network’s output?

A

ŷ = Σ wᵢ φ(wᵢᵀx + bᵢ) + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What geometric structure does each ReLU define in input space?

A

A hyperplane that partitions the space into linear regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a piecewise linear function?

A

A function composed of multiple linear segments joined at certain points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What causes ‘joints’ or ‘kinks’ in the output function of a ReLU network?

A

Switches in ReLU activation from off to on or vice versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many linear regions can a shallow network with D ReLU units form?

A

Up to a number that grows with input dimension and D (e.g., by Zaslavsky’s theorem).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Zaslavsky’s theorem estimate?

A

The number of regions a set of hyperplanes can divide space into.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What determines the number of regions a shallow network can create?

A

The number of ReLU units and input dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an activation pattern in a neural network?

A

A combination of which ReLU units are active or inactive for a given input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens in a different region of activation?

A

A different linear function is applied to the input.

17
Q

Why can shallow networks approximate any continuous function?

A

Because of the universal approximation theorem.

18
Q

What does the universal approximation theorem state?

A

A shallow network can approximate any continuous function with enough hidden units.

19
Q

What is the typical structure of a 1D shallow network with 3 ReLUs?

A

A piecewise linear function with up to 3 joints.

20
Q

How does a shallow network generalize to multiple inputs and outputs?

A

Each hidden unit computes a dot product with the input; each output sums over hidden units.

21
Q

What is the role of the output weights in a shallow network?

A

They scale and combine the activations from hidden units.

22
Q

What is a fully connected layer?

A

A layer where every neuron is connected to all inputs from the previous layer.

23
Q

How many parameters does a shallow network have with Dᵢ inputs, D hidden units, and Dₒ outputs?

A

Dᵢ·D + D·Dₒ + D + Dₒ

24
Q

Why can more hidden units be a double-edged sword?

A

They increase model capacity but also risk overfitting.

25
What does increasing hidden units allow a shallow network to do?
Model more complex functions by creating more linear regions.
26
What type of function is the output of a ReLU-based shallow network?
A piecewise linear function.
27
What does a shallow network compute in higher dimensions?
A partitioning of input space into convex polytopes, each with a unique activation pattern.