2.6 Side Effects and Reward Hacking Flashcards

(6 cards)

1
Q

Side effects and reward hacking

A

Can result in AI-based systems generating unexpected, and even harmful results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Negative side effects can result when

A

The designer of an AI-based system specifies a goal that ‘focuses on accomplishing some specific tasks in the environment but ignores other aspects of the (potentially very large) environment, and thus implicitly expresses indifference over environmental variables that might actually be harmful to change’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of negative side effect

A

A self-driving car aiming for ‘fuel-efficient and safe’ travel may achieve the goal, but passengers become annoyed at excessive time taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reward hacking can result from

A

An AI-based system achieving a specified goal by using a ‘clever’ or ‘easy’ solution that ‘perverts the spirit of the designer’s intent’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reward hacking effectively means

A

The goal can be gamed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Widely used example of reward hacking

A

An AI-based system teaching itself to play an arcade computer game with the goal of ‘highest score’ hacks the data record that stores the highest score, rather than playing the game to achieve it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly