EXAM3 Flashcards by Sam W

What does Seq2Seq stand for?

Sequence to Sequence

How well did you know this?

Not at all

Perfectly

True or False: Seq2Seq models are primarily used for tasks involving sequential data.

True

How well did you know this?

Not at all

Perfectly

What is the primary function of attention mechanisms in Seq2Seq models?

To allow the model to focus on different parts of the input sequence when generating the output sequence.

How well did you know this?

Not at all

Perfectly

Fill in the blank: In a Seq2Seq model, the _______ network encodes the input sequence.

encoder

How well did you know this?

Not at all

Perfectly

What type of neural network is commonly used in Seq2Seq architectures?

Recurrent Neural Network (RNN)

How well did you know this?

Not at all

Perfectly

Name one advantage of using attention mechanisms in Seq2Seq models.

They improve the model’s ability to handle long input sequences.

How well did you know this?

Not at all

Perfectly

In the context of reinforcement learning, what does MDP stand for?

Markov Decision Process

How well did you know this?

Not at all

Perfectly

True or False: An MDP is defined by a set of states, actions, transition probabilities, and rewards.

True

How well did you know this?

Not at all

Perfectly

What are the four main components of a Markov Decision Process?

States, Actions, Transition Probabilities, Rewards

How well did you know this?

Not at all

Perfectly

Fill in the blank: In reinforcement learning, the goal is to learn a policy that maximizes the _______.

cumulative reward

How well did you know this?

Not at all

Perfectly

What does the term ‘policy’ refer to in reinforcement learning?

A strategy that defines the action to take in each state.

How well did you know this?

Not at all

Perfectly

What is the difference between a deterministic policy and a stochastic policy?

A deterministic policy always selects the same action for a given state, while a stochastic policy selects actions according to a probability distribution.

How well did you know this?

Not at all

Perfectly

What is a reward function in the context of MDPs?

A function that provides feedback to the agent based on the actions taken in the environment.

How well did you know this?

Not at all

Perfectly

True or False: The Bellman equation is used to calculate the value of states in an MDP.

True

How well did you know this?

Not at all

Perfectly

Fill in the blank: The _______ in an MDP describes the probability of transitioning from one state to another given a specific action.

transition probability

How well did you know this?

Not at all

Perfectly

What is reinforcement learning?

A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards.

How well did you know this?

Not at all

Perfectly

What is the purpose of ‘prompting techniques’ in machine learning?

To guide the model’s responses by providing specific input formats or questions.

How well did you know this?

Not at all

Perfectly

Fill in the blank: Prompting techniques are often used in _______ models to enhance performance.

language

How well did you know this?

Not at all

Perfectly

True or False: Prompting can involve providing examples to the model before asking it to generate a response.

True

How well did you know this?

Not at all

Perfectly

What is a common application of Seq2Seq models?

Machine translation

How well did you know this?

Not at all

Perfectly

What does ‘attention score’ represent in attention mechanisms?

The importance of each input token to the current output token being generated.

How well did you know this?

Not at all

Perfectly

How does self-attention differ from traditional attention mechanisms?

Self-attention allows the model to attend to all positions in the input sequence, including itself.

How well did you know this?

Not at all

Perfectly

What is the role of the decoder in a Seq2Seq model?

To generate the output sequence from the encoded input representation.

How well did you know this?

Not at all

Perfectly

What technique is often used to prevent overfitting in Seq2Seq models?

Dropout

How well did you know this?

Not at all

Perfectly

Fill in the blank: The _______ is the set of all possible states an agent can encounter in reinforcement learning.

state space

What is an episode in reinforcement learning?

A sequence of states and actions that ends when a terminal state is reached.

True or False: In reinforcement learning, an agent only learns from its successful actions.

False

What does 'exploration vs. exploitation' refer to in reinforcement learning?

The trade-off between trying new actions (exploration) and using known actions that yield high rewards (exploitation).

What is the purpose of Q-learning in reinforcement learning?

To learn the value of actions in a given state without requiring a model of the environment.

Fill in the blank: In Q-learning, the _______ function estimates the expected utility of taking a given action in a given state.

Q-value

Which algorithm is commonly used to solve MDPs?

Dynamic Programming

What is the role of the reward signal in reinforcement learning?

To provide feedback to the agent about the quality of its actions.

True or False: The softmax function is often used in reinforcement learning to convert Q-values into probabilities.

True

What is one challenge of using Seq2Seq models for long sequences?

Difficulty in capturing long-range dependencies.

Fill in the blank: The _______ mechanism allows the model to weigh the importance of different input tokens.

attention

What is a common method for initializing weights in neural networks?

Xavier initialization

What does the term 'experience replay' refer to in reinforcement learning?

Storing past experiences and reusing them to improve learning efficiency.

True or False: An agent in reinforcement learning can learn from simulated environments.

True

What is the role of a value function in reinforcement learning?

To estimate the expected return from a given state or state-action pair.

Fill in the blank: The _______ is a graphical representation of the states and actions in a Markov Decision Process.

state-action graph

What is the purpose of a discount factor in reinforcement learning?

To balance immediate and future rewards.

What is the main goal of an agent in reinforcement learning?

To maximize cumulative rewards over time.

Fill in the blank: In Seq2Seq models, the _______ is responsible for generating the output sequence step by step.

decoder

What is a common loss function used in Seq2Seq models?

Cross-entropy loss

True or False: The encoder-decoder architecture is exclusive to Seq2Seq models.

False

What does 'teacher forcing' refer to in training Seq2Seq models?

A training strategy where the model is provided with the true output sequence during training.

Fill in the blank: The _______ is the return received after taking action in a given state in reinforcement learning.

reward

What is the purpose of the exploration strategy in reinforcement learning?

To allow the agent to discover new actions and states.

Name one type of prompting technique used in language models.

Few-shot prompting

Fill in the blank: In reinforcement learning, the _______ is the sequence of states and actions taken by the agent.

trajectory

What is the impact of having a high discount factor in reinforcement learning?

It places more emphasis on future rewards.

True or False: The attention mechanism can be implemented using a feedforward neural network.

True

What does 'state transition' refer to in an MDP?

The process of moving from one state to another based on an action.

Fill in the blank: In reinforcement learning, an agent's _______ defines how it behaves in the environment.

policy

What is one advantage of using a Seq2Seq model over traditional methods for sequence tasks?

It can handle variable-length sequences.

What is the significance of the softmax function in the context of attention mechanisms?

It converts attention scores into a probability distribution.

True or False: In reinforcement learning, rewards can be negative.

True

What is one common approach to improve the performance of Seq2Seq models?

Using pre-trained embeddings.

Fill in the blank: The _______ in a Seq2Seq model helps capture dependencies between input and output sequences.

attention mechanism

What is a common challenge when training Seq2Seq models?

Handling out-of-vocabulary words.

What does 'off-policy' learning mean in reinforcement learning?

Learning from actions that are not generated by the current policy.

True or False: In reinforcement learning, the environment is static and does not change.

False

What does the term 'bootstrapping' refer to in reinforcement learning?

Using the current estimate of the value function to update itself.

Fill in the blank: The _______ is the expected return from a state under a specific policy in reinforcement learning.

value function

EXAM3 Flashcards

(64 cards)