Machine Translation and Encoder-Decoder Models Flashcards
(43 cards)
What is machine translation?
It is the use of computers to translate one language to another
What machine translation models exist?
Statistical phrase alignment models, Encoder-Decoder models and Transformer models
What type of task is machine translation?
It is a sequence to sequence task (seq2seq)
What is the input, output and their lengths for a seq2seq task?
The input X is a sequence of words, the output Y is a sequence of words, but the length of X may not necessarily equal the length of Y
Besides machine translation, what are some other seq2seq tasks?
Question → Answer
Sentence → Clause
Document → Abstract
What do universal aspects mean in regards to the human language?
These are aspects that are true, or statistically mostly true for all languages
What are some examples of universal aspects in the human language?
Nouns/Verbs, Greetings, Politeness/Rude
What are translation divergences?
These are areas where languages differ
What are some examples of translation divergences?
Idiosyncrasies and lexical differences
Systematic differences
What is the study of translation divergences called?
Linguistic Typology
What is Word Order Typology?
It is a way of ordering words in different ways, these can include:
- Subject-Verb-Object (SVO)
- Subject-Object-Verb (SOV)
- Verb-Subject-Object (VSO)
What is the Encoder-Decoder model?
For an input sequence, we have an encoder, that encodes the input to a context vector, which is then sent to a decoder that generates the output sequence.
What can an encoder be?
LSTM, GRU, CNN, Transformers
What is a context vector?
It is the last hidden layer of the encoder, which is used as the input to the decoder
What does a language model try to do?
Predict the next word in a sequence Y based on the previous word
How is a translation model different to a language model?
It predicts the next word in the sequence Y based on the previous target word AND the full source sequence X
Explain how the encoder-decoder model shown in the image works
We have a single hidden layer that takes as an input the embeddings of the source text, we then have a separator and the predicted words based on its training. The predicted words are used in the prediction of the next word until the end is reached. The key is that the final hidden layer of the last input word is fed into the decoder which predicts the target words
By using the hidden layer at the end of the sentence in the machine translation model, what is avoided?
It avoids the word typology problems as we have full knowledge of where the sentence starts and ends, meaning that before we start to translate the first word, we know what the end word is
How is the encoder trained in machine translation models?
The input words are embedded using an embedding layer and are fed in one at a time to the encoder until the full input has been seen.
Which state is the final hidden layer of the encoder fed into the decoder?
It is fed into every single state of the decoder
What are the inputs at each step in the decoder?
The context vector c (final hidden state of encoder), the previous output, yt-1 and the previous hidden state of the decoder ht-1
What is the typical loss function for a machine translation model?
It is a cross entropy loss function
What is used during training of machine translation but not inference?
Teacher forcing, as we want to ensure we train on exact translations of the words
What is the total loss per sentence in machine translation?
It is the averages loss across all target words