2.6 Side Effects and Reward Hacking Flashcards

Question 1

Q

Side effects and reward hacking

Answer

A

Can result in AI-based systems generating unexpected, and even harmful results

Question 2

Q

Negative side effects can result when

Answer

A

The designer of an AI-based system specifies a goal that ‘focuses on accomplishing some specific tasks in the environment but ignores other aspects of the (potentially very large) environment, and thus implicitly expresses indifference over environmental variables that might actually be harmful to change’

Question 3

Q

Example of negative side effect

Answer

A

A self-driving car aiming for ‘fuel-efficient and safe’ travel may achieve the goal, but passengers become annoyed at excessive time taken

Question 4

Q

Reward hacking can result from

Answer

A

An AI-based system achieving a specified goal by using a ‘clever’ or ‘easy’ solution that ‘perverts the spirit of the designer’s intent’

Question 5

Q

Reward hacking effectively means

Answer

A

The goal can be gamed

Question 6

Q

Widely used example of reward hacking

Answer

A

An AI-based system teaching itself to play an arcade computer game with the goal of ‘highest score’ hacks the data record that stores the highest score, rather than playing the game to achieve it

2.6 Side Effects and Reward Hacking Flashcards

(6 cards)