DL-07 - Sequence models Flashcards by Rikard Donnelly

DL-07 - Sequence models

What is a sequence model?

A model that handles sequential data, where the order of the data matters.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is another name for sequence models?

seq2seq

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is another name for seq2seq?

Sequence models.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is the definition of a sequence model?

a ML model where input or output is a sequence of data (e.g., text data, audio data, time series data).

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What are the different types of sequence models called (abstract)? (4)

One to one
one to many
many to one
many to many

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe what a one to one model looks like.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe what a one to many model looks like.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe what a many to one model looks like.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe what a many to many model looks like.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is a name entity recognition task?

E.g. determine what words in a sentence is an entity. Generally names of things.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What task is this an example of? (See image)

Entity recognition

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is sentiment analysis?

Predict the sentiment of some input, e.g. positive or negative. (See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What task is this an example of? (See image)

Sentiment analysis.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is activity recognition?

A task where you label the activity in e.g. an image or a video. (See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What task is this? (See image)

Activity recognition.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What are some popular sequence models? (3)

RNN
LSTM
Transformers

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is the main idea behind RNNs?

RNNs process sequential data by maintaining an internal state and iteratively updating it with each input in the sequence.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What model can you think of as a sequence of neural networks that are trained one after another?

RNN (and LSTM)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe how we typically draw RNNs.

(See image)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What model is depicted?

RNN

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

Describe what X, t, h and y are in the image. (See image)

t is the time step
x are inputs
h are hidden states
y is the predicted outputs

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What parameters does an RNN layer have?

Weights
Biases
Hidden state/recurrent weights (output at previous time step)

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

In an RNN, what is T_x and T_y?

The number of inputs and the number of outputs.

How well did you know this?

Not at all

Perfectly

DL-07 - Sequence models

What is BPTT short for?

Backpropagation through time

How well did you know this?

Not at all

Perfectly

# DL-07 - Sequence models How does backpropagation through time work?

By unrolling the recurrent neural network through time and applying standard backpropagation to compute gradients for updating weights.

# DL-07 - Sequence models When is loss backpropagated in BPTT?

The loss is backpropagated from the last to the first time step that allows updating the weights.

# DL-07 - Sequence models What is NLP short for?

Natural language processing.

# DL-07 - Sequence models What are the two steps of text sequence representation in Natural Language Processing (NLP)?

The two steps are: - creation of vocabulary - numeric representation of text/words.

# DL-07 - Sequence models In NLP, what is a vocabulary?

A dictionary of unique words of interest (e.g. Norwegian or English words).

# DL-07 - Sequence models In NLP, what do we call a dictionary of unique words of interest (e.g. Norwegian or English words)?

A vocabulary.

# DL-07 - Sequence models How are sentences tokenized?

Generally word by word. (Advanced: See subword tokenization)

# DL-07 - Sequence models How do you create a vocabulary? (5)

Take your input data. Perform: - Remove punctuation - Remove stop words - Stem words (transporting -> transport) - Add Start/end of sentence tokens - Add other identifiers as necessary (, ). Select unique words.

# DL-07 - Sequence models What are the commonly used technique for text representation? (3)

- One-hot encoding - Bag-of-words - Word embeddings

# DL-07 - Sequence models What are these examples of in NLP? - One-hot encoding - Bag-of-words - Word embeddings

Text representation techniques.

# DL-07 - Sequence models How do you use one-hot-encoding to represent text in NLP?

Convert each unique character or word into a binary vector with a 1 at the position corresponding to that character or word and 0s elsewhere.

# DL-07 - Sequence models What is the issue with one-hot encoding in NLP?

Curse of dimensionality!

# DL-07 - Sequence models How can you solve the problem of curse of dimensionality with one-hot encoded data?

One-hot encoded vectors can be transformed to a lower dimensional space using an embedding technique.

# DL-07 - Sequence models In NLP, what is bag-of-words?

Bag-of-words is a representation technique where a text is described by the frequency of its words, disregarding grammar and word order but maintaining multiplicity.

# DL-07 - Sequence models What is BOW short for?

Bag-of-words representation.

# DL-07 - Sequence models What does the bag-of-words (BOW) representation do with words in a text?

The bag-of-words representation puts words in a "bag" and scores them based on their counts or frequencies in the text.

# DL-07 - Sequence models What could the BOW representation for this sentence look like? input text: “I love AI. AI is cool”

BoW representation: [2, 1, 1, 1, 1 ] corresponding to the vocabulary: [AI, cool, I, is, love]. The vocabulary is a vector where the index corresponds to a particular word, and the number in that position is the times it occurred in the sentence.

# DL-07 - Sequence models What are some problems with BOW word frequency?

highly frequent words in the document dominate (larger score), even if they do not contain as much “informational content” (E.g. words like I, The, a).

# DL-07 - Sequence models What is TF-IDF short for?

Term Frequency-Inverse Document Frequency

# DL-07 - Sequence models What does TF-IDF do?

It rescales the frequency of words by how often they appear in all documents.

# DL-07 - Sequence models What is the formula for TF-IDF?

(See image)

# DL-07 - Sequence models What are word embeddings in NLP?

Word embedding is a technique to map words or phrases to vectors of numerical values, of given size.

# DL-07 - Sequence models What does the word embedding technique do? (2)

- Maps words or phrases to vectors of numerical values, of given size. - Dimensionality reduction of word/sentences.

# DL-07 - Sequence models What are some popular word embedding techniques? (3)

- GloVe - Word2Vec - NN embedding layer

# DL-07 - Sequence models What are GloVe and Word2Vec examples of?

Word embedding techniques (/models?).

# DL-07 - Sequence models How does an NN embedding layer work?

(See image)

# DL-07 - Sequence models What can happen to gradients in RNNs?

Vanishing or exploding gradients occur, causing the model to stop learning or take too long.

# DL-07 - Sequence models What do traditional sequence models struggle with, in terms of relating to past information?

They cannot relate to the past beyond the immediate previous input.

# DL-07 - Sequence models What is a solution for improving sequence models to better remember distant inputs?

Add memory and make efficient use of it, possibly by forgetting less relevant information.

# DL-07 - Sequence models What are two improved Seq2Seq models that incorporate memory?

- GRU (Gated Recurrent Unit) - LSTM (Long Short-Term Memory).

# DL-07 - Sequence models What is GRU short for?

Gated Recurrent Unit

# DL-07 - Sequence models What is LSTM short for?

Long Short-Term Memory

# DL-07 - Sequence models Label the parts that are masked out.

- Forget - Update - Input - Output (Result)

# DL-07 - Sequence models What is the main purpose of LSTM networks in deep learning?

LSTM networks extend the memory of RNNs to learn from important experiences with long time steps in between.

# DL-07 - Sequence models What is one advantage of using LSTM networks over traditional RNNs?

LSTM networks enable short-term memory to last for a longer time.

# DL-07 - Sequence models What issue with sequence model training does LSTMS help mitigate?

LSTM networks help mitigate the problematic issue of vanishing gradients.

# DL-07 - Sequence models What are the gates in an LSTM called? (4)

- Input - Output - Update - Forget

# DL-07 - Sequence models What is the purpose of the input gate in an LSTM?

The input gate determines how much of the new input should be added to the cell state.

# DL-07 - Sequence models What is the purpose of the forget gate in an LSTM?

The forget gate decides what information to discard from the cell state.

# DL-07 - Sequence models What is the purpose of the output gate in an LSTM?

The output gate selects which values from the updated cell state will be passed to the next hidden state.

# DL-07 - Sequence models What is the purpose of the update gate in an LSTM?

The update gate computes candidate values to be added to the cell state, based on the current input and previous hidden state.

# DL-07 - Sequence models Describe the LSTM model's architecture.

(See image)

# DL-07 - Sequence models What are the inputs of the LSTM cell called? (3)

- Input - Hidden state - Cell state

# DL-07 - Sequence models What are the outputs of the LSTM cell called? (3)

- Hidden state - Cell state - Output

# DL-07 - Sequence models What optimizers have worked well for text data with LSTM for text data? (2)

- Adam - Adagrad

# DL-07 - Sequence models What activation function and loss should you use for LSTM with text data?

- Softmax (predict prob for word) - Cross-entropy loss

# DL-07 - Sequence models What metrics would you use for LSTM with text data?

Accuracy, precision, recall. Think of outputs as the probability of outputting the correct word.

# DL-07 - Sequence models What is a bidirectional RNN?

A bidirectional RNN is a type of recurrent neural network that processes input data in both forward and backward directions, capturing information from both past and future contexts.