week 3 Flashcards

(28 cards)

1
Q

What is reinforcement learning

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the main points of RL

A
No supervisor, only a reward signal
• Feedback is delayed
• Sequential samples, not iid (time
matters)
• Actions influence future
observations

idd: independently and identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a reward and how does it relate to the rl problem

A

Rt is a scalar and it indicates how ell the agent is doing at step t, so there agents job is to maximize the cumulative reward, (the sum of the rewards across an episode )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is sequential decision making and how does it effect the rl process ?

A
  • Goal: select actions to maximize total future reward
  • Actions may have long term consequences
  • Reward may be delayed
  • Immediate vs. long-term reward

this is thnk about how you get a reward and the proulm with sequential decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the reinforcement learning loop

A

the step for the loops are in t (time ) t, t+1, t+2 t+n
at each tiime it executes action At and gets back a observation that it gets in scalar reward from
– the enviroment
Receives action At
• Emits observation Ot+1(6= St+1)
• Emits scalar reward Rt+1
• t increments at every step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what happens when you have a forward model in rl and if not

A

if you have a forward model this loop happens in the agents’s brain,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the different between a forward model and just paying the game ?

A

i think it is just where the loop happens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is history ?

A

history is the sequence of observations, actions and rewards, ht = O1r1a1,……,Ot-1Rt-1At-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

talk about state and stuff

A

2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the Markov state

A

A markov state contains all useful information from the history,

• ”The future is independent of the past given the present” (if St is known,
Ht can be thrown away)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is full observability?

A

agents observe the full envioment, this is a markov decision process (MDP), chess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a partial observability

A

agent observes part of the environment, partially observable markov decision process, poker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

look at slibes from 11-13

A

7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a policy and model, vaule functiuon what are there differnrets and how does it fit in with Rl ?

A

vaule and poilcy are two methods for solving MDP’s both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

need to read over this web page and stuff

17
Q

Markov Decision Processes

18
Q

what is the markov property

A

it mean that only the present matters , but the state has all the info form all state befor buiild in

19
Q

is the markov property stationary

A

yes, this mean that the model and the different actions that you can take say the same,

20
Q

what are the 4 things that make up the MDP

A

state : S
model/transitions T(S,A,S’) - Pr(S’ | S,A)
actions :A
reward :R(S), R(S,A), R(S,A,S’)

21
Q

what is the solution of a MDP

A

it is a policy, pie(s) -> a

pie* is the optiime , max your long term reward

22
Q

if you have a maze what are the 3 step

A
  1. maze define the SMAR
  2. POLICY(this define the best thing to do it each state )
  3. Value Function(give values to each of the states )
23
Q

why do we need a discount in a MDP

24
Q

what is the def for a value function

A

The state-value function v(s) of an MRP is the expected return starting from
state s:

25
talk about the gamma
https://www.youtube.com/watch?v=ojxurp9BYlg&index=19&list= PLAwxTw4SYaPnidDwo9e2c7ixIsu_pdSNp
26
what is a Markov decisiion process
a MDP iis a markov reward process where the agent makes decisions according to a policy
27
what does discount really defiine
is how much you care about the furcter
28
need to go over the couple of siide and understnad then differnet betwwen mrp mdp
fill iin