week 3 Flashcards by Jonathan Hind

What is reinforcement learning

How well did you know this?

Not at all

Perfectly

what are the main points of RL

No supervisor, only a reward signal
• Feedback is delayed
• Sequential samples, not iid (time
matters)
• Actions influence future
observations

idd: independently and identically distributed.

How well did you know this?

Not at all

Perfectly

What is a reward and how does it relate to the rl problem

Rt is a scalar and it indicates how ell the agent is doing at step t, so there agents job is to maximize the cumulative reward, (the sum of the rewards across an episode )

How well did you know this?

Not at all

Perfectly

what is sequential decision making and how does it effect the rl process ?

Goal: select actions to maximize total future reward
Actions may have long term consequences
Reward may be delayed
Immediate vs. long-term reward

this is thnk about how you get a reward and the proulm with sequential decision making

How well did you know this?

Not at all

Perfectly

what is the reinforcement learning loop

the step for the loops are in t (time ) t, t+1, t+2 t+n
at each tiime it executes action At and gets back a observation that it gets in scalar reward from
– the enviroment
Receives action At
• Emits observation Ot+1(6= St+1)
• Emits scalar reward Rt+1
• t increments at every step

How well did you know this?

Not at all

Perfectly

what happens when you have a forward model in rl and if not

if you have a forward model this loop happens in the agents’s brain,

How well did you know this?

Not at all

Perfectly

what is the different between a forward model and just paying the game ?

i think it is just where the loop happens

How well did you know this?

Not at all

Perfectly

what is history ?

history is the sequence of observations, actions and rewards, ht = O1r1a1,……,Ot-1Rt-1At-1

How well did you know this?

Not at all

Perfectly

talk about state and stuff

How well did you know this?

Not at all

Perfectly

the Markov state

A markov state contains all useful information from the history,

• ”The future is independent of the past given the present” (if St is known,
Ht can be thrown away)

How well did you know this?

Not at all

Perfectly

what is full observability?

agents observe the full envioment, this is a markov decision process (MDP), chess

How well did you know this?

Not at all

Perfectly

what is a partial observability

agent observes part of the environment, partially observable markov decision process, poker

How well did you know this?

Not at all

Perfectly

look at slibes from 11-13

How well did you know this?

Not at all

Perfectly

what is a policy and model, vaule functiuon what are there differnrets and how does it fit in with Rl ?

vaule and poilcy are two methods for solving MDP’s both

How well did you know this?

Not at all

Perfectly

https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa

How well did you know this?

Not at all

Perfectly

need to read over this web page and stuff

Study These Flashcards

Markov Decision Processes

Study These Flashcards

what is the markov property

Study These Flashcards

it mean that only the present matters , but the state has all the info form all state befor buiild in

is the markov property stationary

Study These Flashcards

yes, this mean that the model and the different actions that you can take say the same,

what are the 4 things that make up the MDP

Study These Flashcards

state : S
model/transitions T(S,A,S’) - Pr(S’ | S,A)
actions :A
reward :R(S), R(S,A), R(S,A,S’)

what is the solution of a MDP

Study These Flashcards

it is a policy, pie(s) -> a

pie* is the optiime , max your long term reward

if you have a maze what are the 3 step

Study These Flashcards

maze define the SMAR
POLICY(this define the best thing to do it each state )
Value Function(give values to each of the states )

why do we need a discount in a MDP

Study These Flashcards

what is the def for a value function

Study These Flashcards

The state-value function v(s) of an MRP is the expected return starting from
state s:

talk about the gamma

https://www.youtube.com/watch?v=ojxurp9BYlg&index=19&list= PLAwxTw4SYaPnidDwo9e2c7ixIsu_pdSNp

what is a Markov decisiion process

a MDP iis a markov reward process where the agent makes decisions according to a policy

what does discount really defiine

is how much you care about the furcter

need to go over the couple of siide and understnad then differnet betwwen mrp mdp

fill iin

week 3 Flashcards

(28 cards)