shallow neural network Flashcards by ROWAN Gomanee

What defines a shallow neural network?

A neural network with only one hidden layer.

How well did you know this?

Not at all

Perfectly

What is the main advantage of adding a hidden layer to a model?

It allows the model to learn nonlinear functions.

How well did you know this?

Not at all

Perfectly

What does the activation function in a neuron do?

Applies nonlinearity to the linear combination of inputs and weights.

How well did you know this?

Not at all

Perfectly

What is the most commonly used activation function in this lecture?

ReLU (Rectified Linear Unit).

How well did you know this?

Not at all

Perfectly

What is the formula for the ReLU activation function?

φ(x) = max(0, x)

How well did you know this?

Not at all

Perfectly

Why is ReLU preferred over sigmoid or tanh in hidden layers?

It avoids vanishing gradients and is computationally efficient.

How well did you know this?

Not at all

Perfectly

What does each hidden unit in a shallow network compute?

A linear transformation followed by a nonlinearity.

How well did you know this?

Not at all

Perfectly

What is the full expression for a shallow network’s output?

ŷ = Σ wᵢ φ(wᵢᵀx + bᵢ) + b

How well did you know this?

Not at all

Perfectly

What geometric structure does each ReLU define in input space?

A hyperplane that partitions the space into linear regions.

How well did you know this?

Not at all

Perfectly

What is a piecewise linear function?

A function composed of multiple linear segments joined at certain points.

How well did you know this?

Not at all

Perfectly

What causes ‘joints’ or ‘kinks’ in the output function of a ReLU network?

Switches in ReLU activation from off to on or vice versa.

How well did you know this?

Not at all

Perfectly

How many linear regions can a shallow network with D ReLU units form?

Up to a number that grows with input dimension and D (e.g., by Zaslavsky’s theorem).

How well did you know this?

Not at all

Perfectly

What does Zaslavsky’s theorem estimate?

The number of regions a set of hyperplanes can divide space into.

How well did you know this?

Not at all

Perfectly

What determines the number of regions a shallow network can create?

The number of ReLU units and input dimensions.

How well did you know this?

Not at all

Perfectly

What is an activation pattern in a neural network?

A combination of which ReLU units are active or inactive for a given input.

How well did you know this?

Not at all

Perfectly

What happens in a different region of activation?

Study These Flashcards

A different linear function is applied to the input.

Why can shallow networks approximate any continuous function?

Study These Flashcards

Because of the universal approximation theorem.

What does the universal approximation theorem state?

Study These Flashcards

A shallow network can approximate any continuous function with enough hidden units.

What is the typical structure of a 1D shallow network with 3 ReLUs?

Study These Flashcards

A piecewise linear function with up to 3 joints.

How does a shallow network generalize to multiple inputs and outputs?

Study These Flashcards

Each hidden unit computes a dot product with the input; each output sums over hidden units.

What is the role of the output weights in a shallow network?

Study These Flashcards

They scale and combine the activations from hidden units.

What is a fully connected layer?

Study These Flashcards

A layer where every neuron is connected to all inputs from the previous layer.

How many parameters does a shallow network have with Dᵢ inputs, D hidden units, and Dₒ outputs?

Study These Flashcards

Dᵢ·D + D·Dₒ + D + Dₒ

Why can more hidden units be a double-edged sword?

Study These Flashcards

They increase model capacity but also risk overfitting.

What does increasing hidden units allow a shallow network to do?

Model more complex functions by creating more linear regions.

What type of function is the output of a ReLU-based shallow network?

A piecewise linear function.

What does a shallow network compute in higher dimensions?

A partitioning of input space into convex polytopes, each with a unique activation pattern.

shallow neural network Flashcards

(27 cards)