High Level Vision Flashcards by Phoebe Warner

Describe the steps for image classification:

1) In training stage, pass labelled images through classifier to extract features
2) the model produces a prediction for each category
3) calculate the loss between the predictions and the ground truth
4) backpropagate and update the parameters accordingly
5) after many iterations the model will converge
6) in the test stage, the model is fixed. Pass the test image through the model to get the prediction

How well did you know this?

Not at all

Perfectly

What is binary classification?

Classifying if an image is a specific object or not. Results in “tiger” and “non tiger”

How well did you know this?

Not at all

Perfectly

What is multiclass classification?

Dataset contains multiple categories. Given an image, the classifier assigns a label to it.

How well did you know this?

Not at all

Perfectly

What is multi-label classification?

Images contain multiple objects, aim to predict the probability for all the objects the image contains

How well did you know this?

Not at all

Perfectly

What is hierarchical image classification?

First the classifier predicts the wider category the image contains, then it tries to label it more specifically with subcategories
e.g. Fruit, apple, ladywell apple

How well did you know this?

Not at all

Perfectly

What is a basic approach to image classification?

given RGB image with 32323 pixels, 10 categories, predict 10 numbers representing probability of category in image.
- Summation of probabilities = 1

How well did you know this?

Not at all

Perfectly

What does f(x, W) equal?

f(x, W) = (W*x) + b
where x is the image
W is the weights/parameters
b is the bias

How well did you know this?

Not at all

Perfectly

How would we get the predictions for this: given RGB image with 32323 pixels, 10 categories

f(x, W) = (103027 vector)(13027 vector) + (10 1 vector)

How well did you know this?

Not at all

Perfectly

How would you calculate the score for an image with 4 pixels and 3 classes:

1) Flatten image into 1D vector (4x1)
2) perform matrix multiplication with weights for each category (3x4)
((3x4) . (4x1) = (3x1)
3) add bias to (3x1) vector to get final prediction for each category

How well did you know this?

Not at all

Perfectly

How does the process above change if you use multi-layer perception?

For each different category the weights and bias is different. So for 3 categories you would multiply the (4x1) vector with a different (1x4) grid of weights, then add the bias

How well did you know this?

Not at all

Perfectly

How do we find good values for W and b?

Start with random values then converge to the optimal values of W and b that minimise loss

How well did you know this?

Not at all

Perfectly

What is a loss function and what methods could we use to calculate loss?

A loss function tells us how accurate the classifier is at predicting the categories.
Large loss indicates a poorly trained classifier

Could use the L1 or L2 loss, or SVM loss, Cross-entropy loss, MSE loss, Softmax loss.

How well did you know this?

Not at all

Perfectly

What would be the formula for calculating L1 loss:

Calculating the loss over the dataset as the average of loss over images.
1/N* sum of loss for predictions of each individual image

How well did you know this?

Not at all

Perfectly

How is SVM loss calculated?

takes the max between 0 and the value of the score of the non-actual label minus the score of the actual label + a margin delta (usually 1)

How well did you know this?

Not at all

Perfectly

If this is the prediction for values for 3 classes are given in the following table. Compute the multiclass SVM loss for each class. Then compute the total loss for all classes. Delta = 1

cat: 3.1 1.5 5.2
dog: 0.7 2.4 1.2
person: 1.5 5.1 -1.4

max(0, 0.7 - 3.1 + 1) + max(0, 1.5 - 3.1 + 1)= 0 + 0 = 0
max(0, 1.5 - 2.4 + 1) + max(0, 5.1 - 2.4 + 1)= 0.1 + 3.7= 3.8
max(0, 5.2- -1.4+ 1) + max(0, 1.2- -1.4 + 1)= 7.6+3.6= 11.2

Average: (0 + 3.8 + 11.2)/3 = 5

How well did you know this?

Not at all

Perfectly

What is the difference between a deep learning neural network and a simple neural network?

In deep learning there may be hundreds of hidden layers, that are used to train the model and produce the output

How well did you know this?

Not at all

Perfectly

What is a convolutional neural network?

Study These Flashcards

a type of deep learning model that uses convolutional layers that apply filters to input data to capture image features
given an image and a filter it calculates the output, used for image classification

What are recurrent networks?

Study These Flashcards

The output becomes the next input.
- they have connections that form directed cycles.
- allows retention of memory of previous inputs through hidden states

How do artificial neural networks work?

Study These Flashcards

neurons receive multiple inputs, which have adjustable weights.
a threshold decides whether or not a neuron is active or not.

What is the input signal formula for a neuron?

Study These Flashcards

(sum of (weights*inputs)) + bias

What is the output signal formula for a neuron?

Study These Flashcards

y = function(input)

What is an activation function?

Describe the threshold activation function:

Study These Flashcards

It determines if a neuron is active or not

choose a threshold, if the weighted sum of inputs + bias meets the threshold, the neuron is active

What is the sigmoid function?

Study These Flashcards

It’s an activation function
formula: x = 1/1+e^-x
output is always between 0 and 1

What are the two parameters that can be introduced to the sigmoid function?

How does it fit into the formula?

Study These Flashcards

c1: controls slope of sigmoid
c2: controls horizontal offset

1/1+e^-c1*(x-c2)

How do different values of x effect the output of the sigmoid function?

- for large negative inputs (x), the output approaches 0 - for large positive inputs (x), the output approaches 1. - for inputs near 0, the output is around 0.5

What are the properties of the sigmoid function?

it's domain is (−∞, ∞) it's range is between and including (0, 1) when input is 0, output is 1

What is the derivative of the sigmoid function?

σ′(x)=σ(x)(1−σ(x)) - include derivation process on 2D paper

Name some other activation functions:

- Leaky Relu: max(0.1x, x) - tanh(x) - Relu max(0, x)

What are TLUs?

Technical logic units - simplified version of threshold neuron model - they only accept binary inputs and the weights are usually = 1

Describe the AND TLU node:

- 2 inputs are either 0 or 1, weights = 1. - range of outputs is [0,2] - threshold has to be a value > 1 [1,2] - if input*weight = 0, 0, output = 0 (not active as < threshold) - if input*weight = 0, 1 or 1,0, output = 0 (not active as < threshold) - if input*weight = 1, 1, output = 1 (active as >= threshold)

Describe the OR TLU node:

- 2 inputs are either 0 or 1, weights = 1. - range of outputs is [0,2] - threshold has to be a value >= 1 [0,1] - if input*weight = 0, 0, output = 0 (not active as < threshold) - if input*weight = 0, 1 or 1,0, output = 1 (active as >= threshold) - if input*weight = 1, 1, output = 1 (active as >= threshold)

Why is it impossible to have a single neuron with an XOR activation function?

Because XOR requires a restricted output region, or it would need two different thresholds. output would have be 0 < 1 < 2

How can we create an activation function with a TLU of XOR logic?

Combine multiple TLUs - XOR Logic = (OR) AND (NAND) - 3 TLUs combined OR: 0 or 0 = 0 0 or 1 = 1 1 or 0 = 1 1 or 1 = 1 NAND: 0 and 0 = not(0) = 1 0 and 1 = not(0) = 1 1 and 0 = not(0) = 1 1 and 1 = not(1) = 0 OR AND NAND: 0 and 1 = 0 1 and 1 = 1 1 and 1 = 1 1 and 0 = 0

What does a model generalising well mean?

That the model weights/parameters doesn't become too specific at knowing the features of the training data and is able to perform well on new unseen images as well.

What happens if we use a function with too small a degree, in the middle and a very high degree?

- linear line (degree 1), small margin between data points - polynomial line (degree 2), we get accuracy and gradient stability - polynomial line (degree 9), we get accuracy but gradient instability (not a good option)

What can happen if you train a model on training data too much?

You can achieve 99% accuracy/1% loss, but the model will be overfitted to the training data and not generalise/perform well on the test data

What is the top-5 metric?

When calculating accuracy using correctly labelled image/num of images, counting the prediction as correctly labelled if the correct label is in the top 5 predictions, not just the top 1 prediction

How does back propagation work?

- blaming individual weights for output, identified by calculating the loss - adapting weight and all the weights in following layers that impacted by bad weight - calculate gradient to update the networks weights

Describe MSE (Mean Squared Error)

find sum of all the losses (using L2 loss), then average it over the number input images: MSE = 1/P * sum of (prediction - ground truth)2

How do we use the MSE mean squared error minimise loss?

To minimise the MSE we use the gradient descent method. * Gradient descent finds the absolute minimum of a function. * It is especially useful for high-dimensional functions. * It iteratively minimises the neuron’s error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction.

High Level Vision Flashcards

(40 cards)