Lecture 17 - Reinforcement Learning Flashcards

1
Q

Why is reinforcement learning an important part of AI?

A

Almost all “natural learning” is done by reinforcement

e.g. learning to read, play chess etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the properties of reinforcement learning?

A

Agent is learning to choose a sequence of actions

Ultimate consequences of an action may not be apparent until the end

When a reward is achieved it may not be due to the most recent action.

No predefined set of training samples/examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the credit assignment problem?

A

When a reward is achieved it may not be due to the most recent action, but one performed earlier in the sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the components of a Markov Decision Process

A

Agent operates in a domain represented as a set of distinct states, S

Agent has a set of actions it can perform, A

Time advances in discrete steps

At time t the agent knows the current state st and must select an action to perform

When action at is performed the agent receives a reward rt which may be positive, negative or zero. Reward given depends on the current state and action so can be determined by a reward function R: rt = R(st, at)

New state st+1 depends on the last state and action, so can be determined by a transition function T: st+1 = T(st, at)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does an agent in a Markov Decision Process acquire?

A

A control policy; i.e. a function that determines the best action given a current state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the “immediate reward” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

A

Choosing the action with the highest immediate reward

Produces a good short term payoff but might not be optimal in the long run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the “total payoff” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

A

Maximise the total payoff by choosing a sequence of states that has a large sum of rewards

Not realistic because it will consider a reward in the very distant future just as valuable as one received immediately which is not usually the case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the “discounted cumulative reward” strategy for determining the best action in a Markov Decision Process, and why it is/isn’t usually used

A

Same as total payoff except distant rewards are worth less than more immediate ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the learning task in Markov Decision Processes?

A

To discover the optimal control policy, i.e. the best action for each state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the agent in a Markov Decision Process knows the transition function, the reward function and the discounted value of each state then V* can be used as

A

an evaluation function for actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If an agent in a Markov decision process does not know T or R, no form of evaluation function that requires _____________ is possible

A

looking ahead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Q function?

A

An evaluation function of both state and function that estimates the total payoff from choosing a particular action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are two possible Action Selection strategies in Markov Decision Processes?

A

Uniform Random Selection

Select Highest Expected Cumulative Reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the advantage and disadvantage of using uniform random selection in Markov Decision Processes?

A

Advantage: Will explore entire state space and hence satisfy convergence theorem

Disadvantage: May spend a great deal of time learning the value of transitions that are not optimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the advantage and disadvantage of Selecting the Highest Expected Cumulative Reward as the action selection strategy in markov decision processes?

A

Advantage: Concentrates resources on apparently useful transitions

Disadvantage: May ignore even better pathways which haven’t been explored, and does not satisfy convergence theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly