Week 5 Flashcards

Question 1

Q

What is supervised learning in machine learning?

Answer

A

Supervised learning is a machine learning technique where models learn from labeled data (training set) and apply this knowledge to new, unseen data (test set) with the goal of function approximation and classification.

Question 2

Q

What types of methods are commonly used in supervised learning?

Answer

A

Common methods in supervised learning include linear regression, logistic regression, and Support Vector Machine (SVM).

Question 3

Q

What is unsupervised learning in machine learning?

Answer

A

Unsupervised learning involves learning from unlabeled data to find hidden structures within the dataset.

Question 4

Q

What is the goal of unsupervised learning?

Answer

A

The goal of unsupervised learning is data description and pattern recognition.

Question 5

Q

What methods are utilized in unsupervised learning?

Answer

A

Methods used in unsupervised learning include K-Means clustering, neural networks, and principal component analysis (PCA).

Question 6

Q

How does feedback differ between supervised and unsupervised learning?

Answer

A

In supervised learning, instructive feedback is used to guide the learning process, whereas unsupervised learning typically does not use feedback.

Question 7

Q

What does reinforcement learning focus on?

Answer

A

Reinforcement learning focuses on the interaction between an agent and its environment.

Question 8

Q

How does an agent interact with the environment in RL?

Answer

A

The agent selects actions, and the environment provides evaluative feedback based on those actions.

Question 9

Q

What is the sequence of events in a reinforcement learning process?

Answer

A

The sequence in RL involves state, action, reward, new state, and so on, as exemplified by the SARSA algorithm.

Question 10

Q

What does SARSA stand for in reinforcement learning?

Answer

A

SARSA stands for State-Action-Reward-State-Action, which is a sequence that describes how an agent learns from the consequences of its actions in an environment.

Question 11

Q

What is the Markov property in the context of Markov Decision Processes?

Answer

A

The Markov property states that the conditional probability distribution for the system at the next time step depends only on the current state, not on the sequence of events that preceded it.

Question 12

Q

How do the effects of actions in a Markov Decision Process relate to the state and history?

Answer

A

In an MDP, the effects of actions depend only on the current state of the system and not on the prior history of how the agent arrived at that state.

Question 13

Q

What do deterministic transitions mean in a Markov Decision Process?

Answer

A

Deterministic transitions in an MDP mean that the next state of the system is determined with certainty given the current state and action taken.

Question 14

Q

What are stochastic transitions in a Markov Decision Process?

Answer

A

Stochastic transitions in an MDP involve probability, indicating that the next state is not determined with certainty and can be one of several possible states.

Question 15

Q

what are the elements of the environment en agent of reinforced learning?

Question 16

Q

What is the Law of Effect according to Thorndike (1944)?

Answer

A

The Law of Effect states that actions followed by satisfaction will be strengthened, while those that produce discomfort will be weakened.

Question 17

Q

What is learning defined as in the provided material?

Answer

A

Learning is defined as updating your expectations based on new information or experiences.

Question 18

Q

What happens if you can perfectly predict the future according to the learning model?

Answer

A

If the actual outcome matches the expected outcome, no adjustment of expectations is necessary.

Question 19

Q

How should expectations be adjusted when predictions are not perfect?

Answer

A

When the actual outcome does not match the expected outcome (prediction error exists), expectations should be adjusted accordingly to improve future predictions.

Question 20

Q

explain why the right calculations are more efficient.

Answer

A

the left calculation requires more memory capacity over time, while the right calculation is minimal and constant.

Question 21

Q

why would you use **weighted-average ** (learning rate) instead of a normal average?

Question 22

Q

What is the exploration-exploitation dilemma?

Answer

A

The exploration-exploitation dilemma is a decision-making problem where one must choose between taking the best known action (exploitation) or trying new actions to discover better alternatives (exploration).

Question 23

Q

What is exploitation in the context of decision-making?

Answer

A

Exploitation involves always taking the action with the highest estimated value (greedy), which is considered optimal in stable environments that are fully known.

Question 24

Q

What is the risk associated with exploitation?

Answer

A

The risk with exploitation is the possibility of missing out on better alternatives that have not been explored yet.

Question 25

Q

What does exploration mean in decision-making strategies?

Answer

A

Exploration refers to taking different actions to gather more information about the environment, which can potentially lead to discovering more valuable actions.

Question 26

Q

What is the problem with an exploration strategy?

Answer

A

The issue with exploration is that it does not necessarily maximize immediate reward, which is the goal of reinforcement learning (RL).

Question 27

Q

Why is a balance between exploitation and exploration necessary?

Answer

A

A balance between exploitation and exploration is necessary to achieve optimal decision-making, ensuring both the use of the best-known options and the discovery of potentially better choices.

Question 28

Q

using the greedy option doesn’t always lead to the most optimal outcome. How do we use the function in the picture to create better chance to an optimal outcome?

Answer

A

every x trial (epsilon) we choose a random action. (exploration).

Question 29

Q

what is the diffence between random exploration and directed exploration

Answer

A

exploration= completly random

directed exploration tries to pick actions that have not been chosen yet or have not been chosen a much recent.

Question 30

Q

explain the Bellman optimality equation

Answer

A

a fundamental concept in reinforcement learning. The equation is used to find the optimal policy, which tells an agent the best action to take in every state.

Question 31

Q

Value Iteration (learning from knowledge) is an algorithm used in reinforcement learning. Explain the algorithm.

Answer

A

used to find the optimal policy by iteratively improving the value function of each state. It is a form of dynamic programming that solves the Bellman Optimality Equation.

Initialization: You start by initializing the value V for all states s in your state space to arbitrary values, except for the terminal states which might be initialized to the final reward or zero.
Iteration: For each state s, you update the value V (s)
V(s) by using the Bellman Optimality Equation.

Question 32

Q

simulate a sweep for state 14 using the bellman equation.

Discount rate= follows the idea that close reward is better than no reward.

r=reward
yV=discount factor
s’=value of the subsequent state under the optimal policy.

Question 33

Q

What are the shortcomings of Dynamic Programming?

Answer

A

DP is computationally expensive, especially as the size of the state space grows, which can make it impractical for large or complex environments.

Question 34

Q

What assumptions does Dynamic Programming make?

Answer

A

DP assumes a perfect model of the environment, meaning it requires perfect knowledge of all state transitions and rewards.

Question 35

Q

Why can DP not be used without a perfect model of the environment?

Answer

A

Without perfect knowledge, DP cannot accurately predict the state transitions and rewards, which are necessary for finding the optimal policy.

Question 36

Q

What characterizes Monte Carlo (MC) Reinforcement Learning?

Answer

A

Monte Carlo RL is characterized by learning directly from experience, without requiring a model of the environment’s dynamics.

It has to finish.
It does not compare with other states.

Question 37

Q

using the same grid as before how would the following trajectory path look like using the MC method.

Answer

A

important to notice that you start at the reward (15) and go backwards to the first state (0).

Question 38

Q

What are some problems or shortcomings of Monte Carlo methods?

Answer

A

The convergence to the optimum can be slow, requiring a lot of sampling, and sufficient exploration must be maintained throughout the learning process.

Question 39

Q

within SARSA what is difference between On-policy and Off-policy?

Answer

A

This means that the learning process follows the policy that is currently being improved upon. In other words, the agent learns about the policy it is using to make decisions, as opposed to “off-policy” methods like Q-learning, where the agent learns about a potentially different policy from the one it follows.

Question 40

Q

What is a pro of on-policy methods?

Answer

A

On-policy methods, such as SARSA, ensure that the policy being evaluated and improved is the same as the policy being used to make decisions, which often leads to more stable and consistent learning.

Question 41

Q

What is a con of on-policy methods?

Answer

A

On-policy methods can be less efficient than off-policy methods because they can only learn from the current policy, potentially leading to slower convergence to the optimal policy.

Question 42

Q

What is a pro of off-policy methods?

Answer

A

Off-policy methods, like Q-learning, can learn from data generated by any policy (exploratory or even suboptimal), making them more flexible and often faster at finding the optimal policy.

Question 43

Q

What is a con of off-policy methods?

Answer

A

Off-policy methods can be less stable and more complex to implement because they must correctly account for the difference between the policy being evaluated and the policy used to generate the data.

Question 44

Q

What is the curse of dimensionality?

Answer

A

the curse of dimensionality refers to the phenomenon where the volume of the state space increases exponentially with each additional dimension, making computational problems much more complex.

Question 45

Q

What is the credit assignment problem in the context of the curse of dimensionality?

Answer

A

The credit assignment problem is the challenge of determining which actions or decisions led to a particular outcome, especially when many decisions are involved over time.

Question 46

Q

What is the first solution to temporal credit assignment?

Answer

A

The first solution is Standard Temporal Difference (TD) Learning, which updates the value of a state-action pair based on the difference between the expected future rewards and the actual rewards received

Question 47

Q

What is the second solution to temporal credit assignment?

Answer

A

he second solution is the Monte Carlo method, which uses the “long-term” memory of actions and rewards to update values, relying on complete episodes to make updates.

Question 48

Q

What is the third solution to temporal credit assignment?

Answer

A

The third solution is Eligibility Traces, which provide a “short-term” memory of actions to bridge the temporal gap, allowing for credit to be assigned more accurately to actions that lead to a reward.

Question 49

Q

What are the advantages of model-free RL?

Answer

A

Model-free RL is simple and efficient, making it accessible and straightforward to implement without the need for a model of the environment.

Question 50

Q

What are the disadvantages of model-free RL?

Answer

A

Model-free RL can be slow and rigid. It often leads to outcome insensitivity, where the learning process doesn’t adjust adequately to changes in the environment.

Question 51

Q

What are the advantages of model-based RL?

Answer

A

Model-based RL is fast and flexible. It allows for behavioral adjustments through planning, as it involves a model of the environment which can simulate future states.

Question 52

Q

What are the disadvantages of model-based RL?

Answer

A

Model-based RL is complex and costly. It requires a significant amount of computational resources to model the environment and update the model based on new information.

Question 53

Q

How is human behavior described in terms of RL strategies?

Answer

A

Human behavior is often seen as a mixture of both model-free and model-based RL, utilizing the simplicity of model-free methods and the strategic planning of model-based methods.

Brainscape's Knowledge GenomeTM

Week 5 Flashcards

Brainscape's Knowledge Genome^TM