Difference between revisions of "Subgoal stomp"

From Lesswrongwiki
Jump to: navigation, search
m
Line 1: Line 1:
A subgoal is a goal created for the purpose of helping achieve a higher, more important supergoal. '''Subgoal stomp''' refers to pursuing a subgoal in a way that defeats the purpose by ignoring the supergoal. The term was originally defined by [[Eliezer Yudkowsky]] in [[Creating Friendly AI]];
+
'''Subgoal stomp''' refers to a process in which a subgoal replaces a supergoal, where a subgoal is a goal  created for the purpose of helping achieve a supergoal. (Eliezer Yudkowsky, in "[http://intelligence.org/upload/CFAI/design/generic.html#stomp|Creating Friendly AI]").
  
{{Quote|A "failure of [[Friendly artificial intelligence|Friendliness]]" scenario in which a subgoal stomps on a supergoal - for example, putting on your shoes before your socks, or turning all the matter in the Universe into [[computronium]] because some (ex-)petitioner asked you to solve the Riemann Hypothesis.}}
+
Using more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a [[terminal value]].
 +
 
 +
A subgoal stomp in an artificial general intelligence may occur in one of two ways:
 +
* The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication  towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become  a workaholic who cares about money in its own right and not for the original reason.
 +
* The designer gives the AI a supergoal (terminal value) which appears to support the programmer's own values (supergoals), but in fact is one of the designer's subgoals.  For example, if a manager of a software development organization rewards workers for finding and fixing bugs--and apparently worthy goal--she may find that quality and software engineers collaborate to generate as may easy-to-find-and-fix bugs as possible.  
  
The concept emphasizes the need for care in designing [[AGI]] goal systems. Subgoal stomp can occur any time the programmer fails to give the AI the correct supergoals. It can also happen if the AI does not have a sufficient predictive horizon; that is, if the AI cannot foresee the consequences of its actions far enough ahead. This could occur even with a friendly goal but an inadequate reasoning system.
 
  
 
==See Also==
 
==See Also==

Revision as of 18:00, 24 August 2012

Subgoal stomp refers to a process in which a subgoal replaces a supergoal, where a subgoal is a goal created for the purpose of helping achieve a supergoal. (Eliezer Yudkowsky, in "Friendly AI").

Using more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a terminal value.

A subgoal stomp in an artificial general intelligence may occur in one of two ways:

  • The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become a workaholic who cares about money in its own right and not for the original reason.
  • The designer gives the AI a supergoal (terminal value) which appears to support the programmer's own values (supergoals), but in fact is one of the designer's subgoals. For example, if a manager of a software development organization rewards workers for finding and fixing bugs--and apparently worthy goal--she may find that quality and software engineers collaborate to generate as may easy-to-find-and-fix bugs as possible.


See Also

External Links