Week 5 Flashcards

1
Q

What is supervised learning in machine learning?

A

Supervised learning is a machine learning technique where models learn from labeled data (training set) and apply this knowledge to new, unseen data (test set) with the goal of function approximation and classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What types of methods are commonly used in supervised learning?

A

Common methods in supervised learning include linear regression, logistic regression, and Support Vector Machine (SVM).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is unsupervised learning in machine learning?

A

Unsupervised learning involves learning from unlabeled data to find hidden structures within the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of unsupervised learning?

A

The goal of unsupervised learning is data description and pattern recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What methods are utilized in unsupervised learning?

A

Methods used in unsupervised learning include K-Means clustering, neural networks, and principal component analysis (PCA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does feedback differ between supervised and unsupervised learning?

A

In supervised learning, instructive feedback is used to guide the learning process, whereas unsupervised learning typically does not use feedback.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does reinforcement learning focus on?

A

Reinforcement learning focuses on the interaction between an agent and its environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does an agent interact with the environment in RL?

A

The agent selects actions, and the environment provides evaluative feedback based on those actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the sequence of events in a reinforcement learning process?

A

The sequence in RL involves state, action, reward, new state, and so on, as exemplified by the SARSA algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does SARSA stand for in reinforcement learning?

A

SARSA stands for State-Action-Reward-State-Action, which is a sequence that describes how an agent learns from the consequences of its actions in an environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Markov property in the context of Markov Decision Processes?

A

The Markov property states that the conditional probability distribution for the system at the next time step depends only on the current state, not on the sequence of events that preceded it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do the effects of actions in a Markov Decision Process relate to the state and history?

A

In an MDP, the effects of actions depend only on the current state of the system and not on the prior history of how the agent arrived at that state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do deterministic transitions mean in a Markov Decision Process?

A

Deterministic transitions in an MDP mean that the next state of the system is determined with certainty given the current state and action taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are stochastic transitions in a Markov Decision Process?

A

Stochastic transitions in an MDP involve probability, indicating that the next state is not determined with certainty and can be one of several possible states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the elements of the environment en agent of reinforced learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Law of Effect according to Thorndike (1944)?

A

The Law of Effect states that actions followed by satisfaction will be strengthened, while those that produce discomfort will be weakened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is learning defined as in the provided material?

A

Learning is defined as updating your expectations based on new information or experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens if you can perfectly predict the future according to the learning model?

A

If the actual outcome matches the expected outcome, no adjustment of expectations is necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How should expectations be adjusted when predictions are not perfect?

A

When the actual outcome does not match the expected outcome (prediction error exists), expectations should be adjusted accordingly to improve future predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

explain why the right calculations are more efficient.

A

the left calculation requires more memory capacity over time, while the right calculation is minimal and constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

why would you use **weighted-average ** (learning rate) instead of a normal average?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the exploration-exploitation dilemma?

A

The exploration-exploitation dilemma is a decision-making problem where one must choose between taking the best known action (exploitation) or trying new actions to discover better alternatives (exploration).

23
Q

What is exploitation in the context of decision-making?

A

Exploitation involves always taking the action with the highest estimated value (greedy), which is considered optimal in stable environments that are fully known.

24
Q

What is the risk associated with exploitation?

A

The risk with exploitation is the possibility of missing out on better alternatives that have not been explored yet.

25
Q

What does exploration mean in decision-making strategies?

A

Exploration refers to taking different actions to gather more information about the environment, which can potentially lead to discovering more valuable actions.

26
Q

What is the problem with an exploration strategy?

A

The issue with exploration is that it does not necessarily maximize immediate reward, which is the goal of reinforcement learning (RL).

27
Q

Why is a balance between exploitation and exploration necessary?

A

A balance between exploitation and exploration is necessary to achieve optimal decision-making, ensuring both the use of the best-known options and the discovery of potentially better choices.

28
Q

using the greedy option doesn’t always lead to the most optimal outcome. How do we use the function in the picture to create better chance to an optimal outcome?

A

every x trial (epsilon) we choose a random action. (exploration).

29
Q

what is the diffence between random exploration and directed exploration

A

exploration= completly random

directed exploration tries to pick actions that have not been chosen yet or have not been chosen a much recent.

30
Q

explain the Bellman optimality equation

A

a fundamental concept in reinforcement learning. The equation is used to find the optimal policy, which tells an agent the best action to take in every state.

31
Q

Value Iteration (learning from knowledge) is an algorithm used in reinforcement learning. Explain the algorithm.

A

used to find the optimal policy by iteratively improving the value function of each state. It is a form of dynamic programming that solves the Bellman Optimality Equation.

  1. Initialization: You start by initializing the value V for all states s in your state space to arbitrary values, except for the terminal states which might be initialized to the final reward or zero.
  2. Iteration: For each state s, you update the value V (s)
    V(s) by using the Bellman Optimality Equation.
32
Q

simulate a sweep for state 14 using the bellman equation.

Discount rate= follows the idea that close reward is better than no reward.

r=reward
yV=discount factor
s’=value of the subsequent state under the optimal policy.

A
33
Q

What are the shortcomings of Dynamic Programming?

A

DP is computationally expensive, especially as the size of the state space grows, which can make it impractical for large or complex environments.

34
Q

What assumptions does Dynamic Programming make?

A

DP assumes a perfect model of the environment, meaning it requires perfect knowledge of all state transitions and rewards.

35
Q

Why can DP not be used without a perfect model of the environment?

A

Without perfect knowledge, DP cannot accurately predict the state transitions and rewards, which are necessary for finding the optimal policy.

36
Q

What characterizes Monte Carlo (MC) Reinforcement Learning?

A

Monte Carlo RL is characterized by learning directly from experience, without requiring a model of the environment’s dynamics.

It has to finish.
It does not compare with other states.

37
Q

using the same grid as before how would the following trajectory path look like using the MC method.

A

important to notice that you start at the reward (15) and go backwards to the first state (0).

38
Q

What are some problems or shortcomings of Monte Carlo methods?

A

The convergence to the optimum can be slow, requiring a lot of sampling, and sufficient exploration must be maintained throughout the learning process.

39
Q

within SARSA what is difference between On-policy and Off-policy?

A

This means that the learning process follows the policy that is currently being improved upon. In other words, the agent learns about the policy it is using to make decisions, as opposed to “off-policy” methods like Q-learning, where the agent learns about a potentially different policy from the one it follows.

40
Q

What is a pro of on-policy methods?

A

On-policy methods, such as SARSA, ensure that the policy being evaluated and improved is the same as the policy being used to make decisions, which often leads to more stable and consistent learning.

41
Q

What is a con of on-policy methods?

A

On-policy methods can be less efficient than off-policy methods because they can only learn from the current policy, potentially leading to slower convergence to the optimal policy.

42
Q

What is a pro of off-policy methods?

A

Off-policy methods, like Q-learning, can learn from data generated by any policy (exploratory or even suboptimal), making them more flexible and often faster at finding the optimal policy.

43
Q

What is a con of off-policy methods?

A

Off-policy methods can be less stable and more complex to implement because they must correctly account for the difference between the policy being evaluated and the policy used to generate the data.

44
Q

What is the curse of dimensionality?

A

the curse of dimensionality refers to the phenomenon where the volume of the state space increases exponentially with each additional dimension, making computational problems much more complex.

45
Q

What is the credit assignment problem in the context of the curse of dimensionality?

A

The credit assignment problem is the challenge of determining which actions or decisions led to a particular outcome, especially when many decisions are involved over time.

46
Q

What is the first solution to temporal credit assignment?

A

The first solution is Standard Temporal Difference (TD) Learning, which updates the value of a state-action pair based on the difference between the expected future rewards and the actual rewards received

47
Q

What is the second solution to temporal credit assignment?

A

he second solution is the Monte Carlo method, which uses the “long-term” memory of actions and rewards to update values, relying on complete episodes to make updates.

48
Q

What is the third solution to temporal credit assignment?

A

The third solution is Eligibility Traces, which provide a “short-term” memory of actions to bridge the temporal gap, allowing for credit to be assigned more accurately to actions that lead to a reward.

49
Q

What are the advantages of model-free RL?

A

Model-free RL is simple and efficient, making it accessible and straightforward to implement without the need for a model of the environment.

50
Q

What are the disadvantages of model-free RL?

A

Model-free RL can be slow and rigid. It often leads to outcome insensitivity, where the learning process doesn’t adjust adequately to changes in the environment.

51
Q

What are the advantages of model-based RL?

A

Model-based RL is fast and flexible. It allows for behavioral adjustments through planning, as it involves a model of the environment which can simulate future states.

52
Q

What are the disadvantages of model-based RL?

A

Model-based RL is complex and costly. It requires a significant amount of computational resources to model the environment and update the model based on new information.

53
Q

How is human behavior described in terms of RL strategies?

A

Human behavior is often seen as a mixture of both model-free and model-based RL, utilizing the simplicity of model-free methods and the strategic planning of model-based methods.