Difference between revisions of "Subgoal stomp"
From Lesswrongwiki
Steven0461 (talk | contribs) m |
|||
Line 1: | Line 1: | ||
− | + | '''Subgoal stomp''' refers to a process in which a subgoal replaces a supergoal, where a subgoal is a goal created for the purpose of helping achieve a supergoal. (Eliezer Yudkowsky, in "[http://intelligence.org/upload/CFAI/design/generic.html#stomp|Creating Friendly AI]"). | |
− | + | Using more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a [[terminal value]]. | |
+ | |||
+ | A subgoal stomp in an artificial general intelligence may occur in one of two ways: | ||
+ | * The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become a workaholic who cares about money in its own right and not for the original reason. | ||
+ | * The designer gives the AI a supergoal (terminal value) which appears to support the programmer's own values (supergoals), but in fact is one of the designer's subgoals. For example, if a manager of a software development organization rewards workers for finding and fixing bugs--and apparently worthy goal--she may find that quality and software engineers collaborate to generate as may easy-to-find-and-fix bugs as possible. | ||
− | |||
==See Also== | ==See Also== |
Revision as of 18:00, 24 August 2012
Subgoal stomp refers to a process in which a subgoal replaces a supergoal, where a subgoal is a goal created for the purpose of helping achieve a supergoal. (Eliezer Yudkowsky, in "Friendly AI").
Using more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a terminal value.
A subgoal stomp in an artificial general intelligence may occur in one of two ways:
- The desginer gives the AI correct supergoals, but the AI's goals shift, so that what was earlier a subgoal becomes a supergoal. In humans, this can happen when the long-term dedication towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become a workaholic who cares about money in its own right and not for the original reason.
- The designer gives the AI a supergoal (terminal value) which appears to support the programmer's own values (supergoals), but in fact is one of the designer's subgoals. For example, if a manager of a software development organization rewards workers for finding and fixing bugs--and apparently worthy goal--she may find that quality and software engineers collaborate to generate as may easy-to-find-and-fix bugs as possible.