shallow neural network Flashcards
(27 cards)
What defines a shallow neural network?
A neural network with only one hidden layer.
What is the main advantage of adding a hidden layer to a model?
It allows the model to learn nonlinear functions.
What does the activation function in a neuron do?
Applies nonlinearity to the linear combination of inputs and weights.
What is the most commonly used activation function in this lecture?
ReLU (Rectified Linear Unit).
What is the formula for the ReLU activation function?
φ(x) = max(0, x)
Why is ReLU preferred over sigmoid or tanh in hidden layers?
It avoids vanishing gradients and is computationally efficient.
What does each hidden unit in a shallow network compute?
A linear transformation followed by a nonlinearity.
What is the full expression for a shallow network’s output?
ŷ = Σ wᵢ φ(wᵢᵀx + bᵢ) + b
What geometric structure does each ReLU define in input space?
A hyperplane that partitions the space into linear regions.
What is a piecewise linear function?
A function composed of multiple linear segments joined at certain points.
What causes ‘joints’ or ‘kinks’ in the output function of a ReLU network?
Switches in ReLU activation from off to on or vice versa.
How many linear regions can a shallow network with D ReLU units form?
Up to a number that grows with input dimension and D (e.g., by Zaslavsky’s theorem).
What does Zaslavsky’s theorem estimate?
The number of regions a set of hyperplanes can divide space into.
What determines the number of regions a shallow network can create?
The number of ReLU units and input dimensions.
What is an activation pattern in a neural network?
A combination of which ReLU units are active or inactive for a given input.
What happens in a different region of activation?
A different linear function is applied to the input.
Why can shallow networks approximate any continuous function?
Because of the universal approximation theorem.
What does the universal approximation theorem state?
A shallow network can approximate any continuous function with enough hidden units.
What is the typical structure of a 1D shallow network with 3 ReLUs?
A piecewise linear function with up to 3 joints.
How does a shallow network generalize to multiple inputs and outputs?
Each hidden unit computes a dot product with the input; each output sums over hidden units.
What is the role of the output weights in a shallow network?
They scale and combine the activations from hidden units.
What is a fully connected layer?
A layer where every neuron is connected to all inputs from the previous layer.
How many parameters does a shallow network have with Dᵢ inputs, D hidden units, and Dₒ outputs?
Dᵢ·D + D·Dₒ + D + Dₒ
Why can more hidden units be a double-edged sword?
They increase model capacity but also risk overfitting.