Optimization, Loss Functions, and Regularization Flashcards by Franklin Hole

What is the purpose of a loss function?

It measures model error.

Loss functions are crucial for training models as they provide a metric to minimize.

How well did you know this?

Not at all

Perfectly

What type of loss function is Log Loss?

Cross-Entropy Loss.

Log Loss is commonly used in binary classification problems.

How well did you know this?

Not at all

Perfectly

What loss function is used when there are only two classes?

Binary Cross-Entropy.

This loss function calculates the error for binary classification tasks.

How well did you know this?

Not at all

Perfectly

What is the formula for Cross-Entropy Loss?

L = - (y log(y_pred) + (1 - y) log(1 - y_pred))

How well did you know this?

Not at all

Perfectly

What is backpropagation used for?

To adjust model weights to reduce error.

How well did you know this?

Not at all

Perfectly

Define Gradient Descent.

Step-by-Step Learning.

How well did you know this?

Not at all

Perfectly

What is Stochastic Gradient Descent (SGD)?

Updates weights after each training sample.

How well did you know this?

Not at all

Perfectly

What is Mini-Batch Gradient Descent?

Updates weights using small batches.

How well did you know this?

Not at all

Perfectly

What is the learning rate’s role in gradient descent?

Controls how big the update steps are.

How well did you know this?

Not at all

Perfectly

What does momentum do in gradient descent?

Helps avoid local minima by adding past weight updates to the current one.

How well did you know this?

Not at all

Perfectly

What is dropout in the context of model training?

Randomly removes neurons during training.

How well did you know this?

Not at all

Perfectly

What does regularization do?

Penalizes overly complex models to encourage generalization.

How well did you know this?

Not at all

Perfectly

What is early stopping?

Stops training when validation loss stops improving.

How well did you know this?

Not at all

Perfectly

What are epochs in machine learning?

Number of times the entire dataset is passed through the model.

How well did you know this?

Not at all

Perfectly

What is the batch size in gradient descent?

Number of samples used per gradient update.

How well did you know this?

Not at all

Perfectly

Name a common optimizer used in machine learning.

Adam.

How well did you know this?

Not at all

Perfectly

What is the main takeaway regarding activation functions?

They enable deep learning.

How well did you know this?

Not at all

Perfectly

True or False: Dropout and regularization help prevent overfitting.

True.

How well did you know this?

Not at all

Perfectly

What are the two main types of supervised machine learning?

Classification and Regression.

What does regression in AI predict?

Continuous values based on input data.

How does regression work in AI?

AI learns a function y = f(x) to predict Y.

What is Mean Squared Error (MSE)?

A loss function that measures how far off AI’s predictions are.

What does a lower MSE indicate?

A better AI model.

What is a Simple Artificial Neural Network (ANN) for regression composed of?

Layers of neurons making decisions.

What is the first step in building a regression model?

Load & Prepare Data.

What is Min-Max Scaling?

Rescales data to [0,1].

What is Z-Score Normalization?

Makes data have mean = 0, std dev = 1.

What is the activation function used in the hidden layers of the regression model?

ReLU.

What optimizer is used when compiling the regression model?

Adam.

What is Text Classification?

AI sorts text into categories.

What is the example dataset used for text classification?

IMDB movie reviews.

How many reviews are in the IMDB dataset?

50,000 highly polarized reviews.

What is the first step in preprocessing text for AI?

Convert words into a dictionary of numbers.

What does One-Hot Encoding do in terms of words?

Turns words into 0s & 1s.

What is the input layer dimension for the FCN used in sentiment analysis?

10,000-dimensional input.

What activation function is used in the output layer for binary classification?

Sigmoid.

What is the loss function used for the IMDB dataset?

Binary Crossentropy.

What task is performed with the Reuters dataset?

Classify short news articles into 46 different categories.

How is the Reuters classification different from IMDB classification?

46 categories instead of 2.

What activation function is used in the final layer for multi-class classification?

Softmax.

What loss function is used for the Reuters dataset?

Categorical Crossentropy.

What is the main purpose of regression in the context of the Boston Housing Prices example?

Predict continuous values.

What are the key components of text classification for IMDB and Reuters?

* Predicts categories * One-Hot Encoding * Activation functions: ReLU (hidden), Sigmoid (IMDB), Softmax (Reuters) * Loss functions: Binary Crossentropy (IMDB), Categorical Crossentropy (Reuters)

Why do we evaluate Machine Learning (ML) and Deep Learning (DL) models?

To ensure they do what we want, avoid overfitting/underfitting, and pick the best model and settings.

What is the problem with model evaluation?

How do you know your model isn’t just memorizing the training data?