Difference between revisions of "Subgoal stomp"

From Lesswrongwiki
Jump to: navigation, search
Line 4: Line 4:
  
 
A subgoal stomp in an artificial general intelligence may occur in one of two ways:
 
A subgoal stomp in an artificial general intelligence may occur in one of two ways:
* The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication  towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become  a workaholic who cares only about money as an end in itself.  
+
* The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication  towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become  a workaholic who cares only about money as an end in itself and takes little pleasure in the things that money can buy.
* The designer gives the AI a supergoal (terminal value) which appears to support the designer's own supergoals, but in fact is one of the designer's subgoals. In a  human  organization, if a software development manager, for example, rewards workers for finding and fixing bugs--an apparently worthy goal--she may find that quality and development engineers collaborate to generate as many easy-to-find-and-fix bugs as possible.  
+
 
 +
* The designer gives the AI a supergoal (terminal value) which appears to support the designer's own supergoals, but in fact is one of the designer's subgoals. In a  human  organization, if a software development manager, for example, rewards workers for finding and fixing bugs--an apparently worthy goal--she may find that quality and development engineers collaborate to generate as many easy-to-find-and-fix bugs as possible. In this case, they are correctly and flawlessly executing on the goals which the manager gave them, but her actual  goals are not being maximized.
 +
 
 +
Humans, forged by evolution, provide another example. Their terminal values, such as survival, health, social status, curiosity, etc., originally served instrumentally for the (implicit)  goal of evolution, namely inclusive genetic fitness. Humans do *not* have inclusive genetic fitness as a goal. If we consider evolution as an optimization process (though of course, it not an agent), a subgoal stomp has occurred.
  
  

Revision as of 05:33, 26 August 2012

Subgoal stomp is Eliezer Yudkowsky's term (see "Creating Friendly AI") for the replacement of a supergoal by a subgoal. (A subgoal is a goal created for the purpose of achieving a supergoal.)

In more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a terminal value.

A subgoal stomp in an artificial general intelligence may occur in one of two ways:

  • The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become a workaholic who cares only about money as an end in itself and takes little pleasure in the things that money can buy.
  • The designer gives the AI a supergoal (terminal value) which appears to support the designer's own supergoals, but in fact is one of the designer's subgoals. In a human organization, if a software development manager, for example, rewards workers for finding and fixing bugs--an apparently worthy goal--she may find that quality and development engineers collaborate to generate as many easy-to-find-and-fix bugs as possible. In this case, they are correctly and flawlessly executing on the goals which the manager gave them, but her actual goals are not being maximized.

Humans, forged by evolution, provide another example. Their terminal values, such as survival, health, social status, curiosity, etc., originally served instrumentally for the (implicit) goal of evolution, namely inclusive genetic fitness. Humans do *not* have inclusive genetic fitness as a goal. If we consider evolution as an optimization process (though of course, it not an agent), a subgoal stomp has occurred.


See Also

External Links