NLP Tips and Tricks Flashcards

1
Q

How does the softmax classifier work?

A

It gives a probability of different classifiers from a set of scores. For each input x, predict the probability of class k by taking the dot product of the weight matrix for that class and input x, taking the exponential of it, all divided by the exponent of the sum of all the scores for all the classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the input and output for softmax?

A

The input is a vector of words of length n

The output is a vector of labels (POS tags, NE BIO tags)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the size of the softmax weight matrix?

A

It has dimensions of (C x n), where C is the number of classes and n is the number of entries in the input vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is cross-entropy loss?

A

It is a loss function used to train weights by computing the loss for the probabilities. We want to minimise the cross entropy loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we compute cross entropy?

A

The P of true class distribution, multiplied by the log of the P of the predicted class, summed across all possible classes, multiplied by minus 1. This can be simplified to be -log q (k=true_class), where q is the predicted probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do activation functions do?

A

They compute the hidden layer values given an input to compute some output values for that hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are model parameters modelled?

A

They are a C x n matrix of weights, where C is the number of classes and n is the length of the input vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is shown in the image?

A

It is the matrix of gradient loss, which is computed with respect to the parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is backpropagation?

A

It is a way to compute gradients efficiently using matrix computation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is matrix computation good?

A

It is highly parallelizable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does forward propagation do?

A

It computes hidden layer values using activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does back propagation do?

A

It calculates partial derivatives at each step and pass gradients back through the graph
Compute local gradients + apply chain rule
The downstream gradient = upstream gradient x local gradient
Having multiple inputs will lead to multiple local gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly