High Level Vision Flashcards
(40 cards)
Describe the steps for image classification:
1) In training stage, pass labelled images through classifier to extract features
2) the model produces a prediction for each category
3) calculate the loss between the predictions and the ground truth
4) backpropagate and update the parameters accordingly
5) after many iterations the model will converge
6) in the test stage, the model is fixed. Pass the test image through the model to get the prediction
What is binary classification?
Classifying if an image is a specific object or not. Results in “tiger” and “non tiger”
What is multiclass classification?
Dataset contains multiple categories. Given an image, the classifier assigns a label to it.
What is multi-label classification?
Images contain multiple objects, aim to predict the probability for all the objects the image contains
What is hierarchical image classification?
First the classifier predicts the wider category the image contains, then it tries to label it more specifically with subcategories
e.g. Fruit, apple, ladywell apple
What is a basic approach to image classification?
given RGB image with 32323 pixels, 10 categories, predict 10 numbers representing probability of category in image.
- Summation of probabilities = 1
What does f(x, W) equal?
f(x, W) = (W*x) + b
where x is the image
W is the weights/parameters
b is the bias
How would we get the predictions for this: given RGB image with 32323 pixels, 10 categories
f(x, W) = (103027 vector)(13027 vector) + (10 1 vector)
How would you calculate the score for an image with 4 pixels and 3 classes:
1) Flatten image into 1D vector (4x1)
2) perform matrix multiplication with weights for each category (3x4)
((3x4) . (4x1) = (3x1)
3) add bias to (3x1) vector to get final prediction for each category
How does the process above change if you use multi-layer perception?
For each different category the weights and bias is different. So for 3 categories you would multiply the (4x1) vector with a different (1x4) grid of weights, then add the bias
How do we find good values for W and b?
- Start with random values then converge to the optimal values of W and b that minimise loss
What is a loss function and what methods could we use to calculate loss?
- A loss function tells us how accurate the classifier is at predicting the categories.
- Large loss indicates a poorly trained classifier
Could use the L1 or L2 loss, or SVM loss, Cross-entropy loss, MSE loss, Softmax loss.
What would be the formula for calculating L1 loss:
- Calculating the loss over the dataset as the average of loss over images.
- 1/N* sum of loss for predictions of each individual image
How is SVM loss calculated?
- takes the max between 0 and the value of the score of the non-actual label minus the score of the actual label + a margin delta (usually 1)
If this is the prediction for values for 3 classes are given in the following table. Compute the multiclass SVM loss for each class. Then compute the total loss for all classes. Delta = 1
cat: 3.1 1.5 5.2
dog: 0.7 2.4 1.2
person: 1.5 5.1 -1.4
max(0, 0.7 - 3.1 + 1) + max(0, 1.5 - 3.1 + 1)= 0 + 0 = 0
max(0, 1.5 - 2.4 + 1) + max(0, 5.1 - 2.4 + 1)= 0.1 + 3.7= 3.8
max(0, 5.2- -1.4+ 1) + max(0, 1.2- -1.4 + 1)= 7.6+3.6= 11.2
Average: (0 + 3.8 + 11.2)/3 = 5
What is the difference between a deep learning neural network and a simple neural network?
In deep learning there may be hundreds of hidden layers, that are used to train the model and produce the output
What is a convolutional neural network?
- a type of deep learning model that uses convolutional layers that apply filters to input data to capture image features
- given an image and a filter it calculates the output, used for image classification
What are recurrent networks?
The output becomes the next input.
- they have connections that form directed cycles.
- allows retention of memory of previous inputs through hidden states
How do artificial neural networks work?
- neurons receive multiple inputs, which have adjustable weights.
- a threshold decides whether or not a neuron is active or not.
What is the input signal formula for a neuron?
(sum of (weights*inputs)) + bias
What is the output signal formula for a neuron?
y = function(input)
What is an activation function?
Describe the threshold activation function:
It determines if a neuron is active or not
- choose a threshold, if the weighted sum of inputs + bias meets the threshold, the neuron is active
What is the sigmoid function?
- It’s an activation function
- formula: x = 1/1+e^-x
- output is always between 0 and 1
What are the two parameters that can be introduced to the sigmoid function?
How does it fit into the formula?
c1: controls slope of sigmoid
c2: controls horizontal offset
1/1+e^-c1*(x-c2)