Seq2Seq Learning Flashcards

1
Q

What are Transformers?

A

It’s a type of neural network architecture that have gained popularity after OpenAI used them in their Language Models. Transformeras are great for Seq2Seq learning, ex. in speech recognition, text-to-speech transformation etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sequence transduction?

A

In ML this is the process of doing Seq2Seq learning where the input is a sequence and the output is also a sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What models were used for Seq2Seq learning before Transformers? What is the specific issue these models have to handle?

A

The specific issue is that for Seq2Seq there is a need to have a “memory” in order to generate text or translate, etc.

The two models are Recurrent Neural Networks (RNN) and Convolutional NN (CNN).

RNN has “memory leaks” between layers. Meaning they take information from prior inputs to influence the current input and output. In normal NN we assume the input and outputs are independent from each other, but in sequences there are correlations between the inputs in the sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What were the issues with the earlier methods for Seq2Seq before Transformers?

A

RNN become very ineffective when the gap between the relevant information. The reason for this is because the information is passed at each step (ex. word in a sentence) and the longer the chain the bigger the risk of the information getting lost.

Long-Short Term Memory (LSTM), a specific type of RNN, tries to solve this problem. RNN don’t prioritize the information, LSTM uses cell states to selective forget not important information. LSTM is still a RNN and suffers from similar issues with longer sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What was the main thing that made Transformers so succesfull?

A

A technique for paying attention to specific words called Attention. Transformers combines attention models with encoder and decoders. Multiple encoders and decoders are combined, each encoder has a attention layer and a feed forward NN. The decoder has an additonal layer inbetween that is a attention layer that helps the decoder focus on the relevant parts of the input sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Encoder-Decoder works? Why is it so powerful?

A

A type of NN architecture that is used for Seq2Seq.

The encoder processes an input sequence to produce a set of context vectors, which are then used by the decoder to generate an output sequence. The context vectors are created in the latent space. The idea is that the encoder will reduce the dimensionality and for each layer it will keep the relevant information (see it as doing a summary of a book ish).

The decoder is responsible for taking this encoded representation and reconstructing it back into its original form (or something similar).

By effectively transforming input data into a meaningful representation (i.e. this context vector) and then decoding from that representation to create accurate outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are commong applications of Encoder-Decoders?

A

They are used in Transformers which, they can be used for image-caption generation, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does GPT3 work?

A

Generative Pre-trained Transformer 3 (GPT3) is a large language model from OpenAI that, the large language model generates text from text.

The steps in GPT3 is:
1. Convert input words to vector representation containing numbers.
2. Calculate prediction (vector representation of words)
3. Convert output representation to words.

When calculating the prediction it has a stack of 96 transformer decoder layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Decision Transformers?

A

It’s a model that formulates RL as a conditional-sequence modeling problem. The main idea is that instead of training a policy using RL methods (ex. fitting a value function for decision making) we use a sequence modeling algorithm (i.e. a Transformer) that given a desired return, past states and actions as input will generate future actions to achieve this desired return. What is interesting here is that we no longer maximizie the return, but instead generate a series of future actions to achieve the desired return.

This uses generative trajectory modeling, i.e. modeling the joint distribution of the sequences of states, actions and rewards as input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly