Difference between revisions of "Terminal value"

From Lesswrongwiki
Jump to: navigation, search
Line 1: Line 1:
A terminal value is an ultimate goal, an end-in-itself. In an AI with a utility or reward function, the terminal value is the maximization of that function.  
+
A terminal value is an ultimate goal, an end-in-itself.  
  
In Eliezer Yudkowsky's [http://intelligence.org/files/CFAI.html earlier writings], the non-standard term  "supergoal" is used instead.
+
In an AI with a utility or reward function, the terminal value is the maximization of that function.  
  
==Terminal values vs. instrumental==
+
the non-standard term  "supergoal" is used for this concept in Eliezer Yudkowsky's [http://intelligence.org/files/CFAI.html earlier writings].
Terminal values stand in contrast to instrumental values, which are means-to-an-end, mere tools in achieving terminal values. For example, if a given university student does not enjoy  studying but is doing so merely as a professional qualification, his terminal value is getting a job, while getting good grades is an instrument to that end.
 
  
Some values may be called "terminal" merely in relation to an instrumental goal, yet themselves serve instrumentally towards a higher goal. In the previous example, the student may want the job to gain social status and money; if he could get prestige and money without working he would; and in this case the job is instrumental to these other values. However, in considering future AI, the phrase "terminal value" is  generally used only for the top level of the goal hierarchy: the true ultimate goals of a system, those which do not serve any higher value.
+
==Terminal vs. instrumental vales==
 +
Terminal values stand in contrast to instrumental values, which are means-to-an-end, mere tools in achieving terminal values. For example, if a given university student does not enjoy  studying, but is doing so merely as a professional qualification, his terminal value is getting a job, while getting good grades is an instrument to that end. If a (simple) chess program tries to maximize piece value three turns into the future, that is an instrumental value to its terminal value of winning the game.
 +
 
 +
Some values may be called "terminal" merely in relation to an instrumental goal, yet themselves serve instrumentally towards a higher goal. The student described above may want the job to gain social status and money; if he could get prestige and money without working he would; and in this case the job is instrumental to these other values. However, in considering future AI, the phrase "terminal value" is  generally used only for the top level of the goal hierarchy: the true ultimate goals of a system, those which do not serve any higher value.
  
 
==Human terminal values==
 
==Human terminal values==
Humans system of terminal values is quite complex. The values were forged by evolution in the ancestral environment to maximize inclusive genetic fitness. These values include survival life, health, friendship, social status, love of various kinds, joy, aesthetic pleasure, curiosity, and much more. Evolution's implicit goal is inclusive genetic fitness, but humans do not have inclusive genetic fitness as a goal. Rather, these  values, which were *instrumental* to inclusive genetic fitness have become humans' terminal values (an example of [[subgoal stomp]]).  
+
Humans' system of terminal values is quite complex. The values were forged by evolution in the ancestral environment to maximize inclusive genetic fitness. These values include survival, health, friendship, social status, love, joy, aesthetic pleasure, curiosity, and much more. Evolution's implicit goal is inclusive genetic fitness, but humans do not have inclusive genetic fitness as a goal. Rather, these  values, which were *instrumental* to inclusive genetic fitness, have become humans' *terminal* values (an example of [[subgoal stomp]]).  
  
Humans cannot fully introspect their terminal values. Humans' values are often mutually contradictory and change over time.
+
Humans cannot fully introspect their terminal values. Humans' terminal values are often mutually contradictory, inconsistent, and change over time.
  
 
==Non-human terminal values==
 
==Non-human terminal values==
Future artificial general intelligences are may have the maximization of a utility function or of a reward function (reinforcement learning) set by their designers as their terminal value.
+
Future artificial general intelligences may have the maximization of a utility function or of a reward function (reinforcement learning) as their terminal value.   The function will likely be set by the AI's designers.
  
The [[paperclip maximizer]] is a thought experiment about an artificial general intelligence assigned the apparently innocuous terminal value of maximizing the number of paperclips in its collection, with consequences disastrous to humanity.
+
Since people make tools instrumentally, to serve specific human values, the AI's assigned value system may be much simpler than humans'. This will pose a danger, as an AI must seek to protect *all* human values if a positive human future is to be achieved. The [[paperclip maximizer]] is a thought experiment about an artificial general intelligence with consequences disastrous to humanity, with the the apparently innocuous terminal value of maximizing the number of paperclips in its collection,  
  
[[AIXI]] is a mathematical formalism for modeling intelligence. It illustrates that the arbitrariness of terminal values may be optimized by an intelligence: AIXI  is provably more intelligent than any other agent for *any* computable reward function.
+
An intelligence can in principle work towards any terminal value, not just human-like ones. [[AIXI]] is a mathematical formalism for modeling intelligence. It illustrates that the arbitrariness of terminal values may be optimized by an intelligence: AIXI  is provably more intelligent than any other agent for *any* computable reward function.
  
 
==In a Friendly AI==
 
==In a Friendly AI==
 +
For an artificial general intelligence to have a positive and not a negative effect on humanity, its terminal value must be benevolent to humans. It must seek the maximization of the full set of human values (for the humans' benefit, not for itself).
  
For an artificial general intelligence to have a positive and not a negative effect on humanity, its terminal value must be benevolent to humans, the maximization of human values (for the humans, not for itself).
+
Benevolence may arise even if not specified as an end-goal, is it is  a common instrumental value for agents with a variety of terminal values. For example, humans often  cooperate   because they expect either an immediate benefit in response; or because they want to establish a reputation that may engender future cooperation; or because they have live in a human society that rewards cooperation and punishes misbehavior. Humans sometimes undergo a moral shift (described by Immanuel Kant) in which benevolence changes from a merely instrumental value to a terminal one--they become altruistic and learn to value benevolence in its own right.  
 
 
Benevolence is a common instrumental value for agents with a variety of terminal values, and thus may arise even if not specified as an end-goal. For example, humans usually cooperate not out of pure altruism, but because they expect either an immediate benefit in response; or because they want to establish a reputation that may engender future cooperation; or because they have live in a human society that rewards cooperation and punishes misbehavior. Humans sometimes undergo a Kantian shift in which benevolence changes from a merely instrumental value to a terminal one--they learn to value benevolence in its own right.  
 
  
However, such shifts cannot be relied on to bring about benevolence in an AI. Benevolence as an instrumental value is relevant only when humans are at roughly equal power to the AI. If the AI is much more intelligent than humans, their rewards and punishments will mean nothing to it. Moreover, an AI is unlikely to undergo a Kantian shift, as any changes in one's goals, including [Subgoal stomp|replacement of terminal by instrumental values], generally reduces the likelihood of maximizing one's utility function (Fox & Shulman 2010).
+
However, such shifts cannot be relied on to bring about benevolence in an AI. Benevolence as an instrumental value for an AI is relevant only when humans are at roughly equal power to the AI. If the AI is much more intelligent than humans, it will not care about  rewards and punishments from humans.. Moreover, a sufficiently powerful AI is unlikely to undergo a Kantian shift, as any changes in one's goals, including [subgoal stomp|replacement of terminal by instrumental values], generally reduces the likelihood of maximizing one's utility function (Fox & Shulman 2010; Omohundro 2008).
  
 
==Links==
 
==Links==
Line 33: Line 34:
 
==References==
 
==References==
 
[http://intelligence.org/files/SuperintelligenceBenevolence.pdf Joshua Fox and Carl Shulman (2010), "Superintelligence does not imply benevolence"], Proceedings of the VIII European Conference on Computing and Philosophy, Oct, 2010. Ed. Klaus Mainzer. (Munich: Verlag Dr. Hut), pp. 456-461
 
[http://intelligence.org/files/SuperintelligenceBenevolence.pdf Joshua Fox and Carl Shulman (2010), "Superintelligence does not imply benevolence"], Proceedings of the VIII European Conference on Computing and Philosophy, Oct, 2010. Ed. Klaus Mainzer. (Munich: Verlag Dr. Hut), pp. 456-461
 +
[http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf S. Omohundro, The  basic AI drives]. In Artificial general intelligence 2008: Proceedings of the
 +
first AGI conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin, 483–492. Frontiers in Artificial Intelligence and Applications 171. Amsterdam: IOS Press.

Revision as of 05:37, 27 August 2012

A terminal value is an ultimate goal, an end-in-itself.

In an AI with a utility or reward function, the terminal value is the maximization of that function.

the non-standard term "supergoal" is used for this concept in Eliezer Yudkowsky's earlier writings.

Terminal vs. instrumental vales

Terminal values stand in contrast to instrumental values, which are means-to-an-end, mere tools in achieving terminal values. For example, if a given university student does not enjoy studying, but is doing so merely as a professional qualification, his terminal value is getting a job, while getting good grades is an instrument to that end. If a (simple) chess program tries to maximize piece value three turns into the future, that is an instrumental value to its terminal value of winning the game.

Some values may be called "terminal" merely in relation to an instrumental goal, yet themselves serve instrumentally towards a higher goal. The student described above may want the job to gain social status and money; if he could get prestige and money without working he would; and in this case the job is instrumental to these other values. However, in considering future AI, the phrase "terminal value" is generally used only for the top level of the goal hierarchy: the true ultimate goals of a system, those which do not serve any higher value.

Human terminal values

Humans' system of terminal values is quite complex. The values were forged by evolution in the ancestral environment to maximize inclusive genetic fitness. These values include survival, health, friendship, social status, love, joy, aesthetic pleasure, curiosity, and much more. Evolution's implicit goal is inclusive genetic fitness, but humans do not have inclusive genetic fitness as a goal. Rather, these values, which were *instrumental* to inclusive genetic fitness, have become humans' *terminal* values (an example of subgoal stomp).

Humans cannot fully introspect their terminal values. Humans' terminal values are often mutually contradictory, inconsistent, and change over time.

Non-human terminal values

Future artificial general intelligences may have the maximization of a utility function or of a reward function (reinforcement learning) as their terminal value. The function will likely be set by the AI's designers.

Since people make tools instrumentally, to serve specific human values, the AI's assigned value system may be much simpler than humans'. This will pose a danger, as an AI must seek to protect *all* human values if a positive human future is to be achieved. The paperclip maximizer is a thought experiment about an artificial general intelligence with consequences disastrous to humanity, with the the apparently innocuous terminal value of maximizing the number of paperclips in its collection,

An intelligence can in principle work towards any terminal value, not just human-like ones. AIXI is a mathematical formalism for modeling intelligence. It illustrates that the arbitrariness of terminal values may be optimized by an intelligence: AIXI is provably more intelligent than any other agent for *any* computable reward function.

In a Friendly AI

For an artificial general intelligence to have a positive and not a negative effect on humanity, its terminal value must be benevolent to humans. It must seek the maximization of the full set of human values (for the humans' benefit, not for itself).

Benevolence may arise even if not specified as an end-goal, is it is a common instrumental value for agents with a variety of terminal values. For example, humans often cooperate because they expect either an immediate benefit in response; or because they want to establish a reputation that may engender future cooperation; or because they have live in a human society that rewards cooperation and punishes misbehavior. Humans sometimes undergo a moral shift (described by Immanuel Kant) in which benevolence changes from a merely instrumental value to a terminal one--they become altruistic and learn to value benevolence in its own right.

However, such shifts cannot be relied on to bring about benevolence in an AI. Benevolence as an instrumental value for an AI is relevant only when humans are at roughly equal power to the AI. If the AI is much more intelligent than humans, it will not care about rewards and punishments from humans.. Moreover, a sufficiently powerful AI is unlikely to undergo a Kantian shift, as any changes in one's goals, including [subgoal stomp|replacement of terminal by instrumental values], generally reduces the likelihood of maximizing one's utility function (Fox & Shulman 2010; Omohundro 2008).

Links

Eliezer Yudkowsky, Terminal Values and Instrumental Values

References

Joshua Fox and Carl Shulman (2010), "Superintelligence does not imply benevolence", Proceedings of the VIII European Conference on Computing and Philosophy, Oct, 2010. Ed. Klaus Mainzer. (Munich: Verlag Dr. Hut), pp. 456-461 S. Omohundro, The basic AI drives. In Artificial general intelligence 2008: Proceedings of the first AGI conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin, 483–492. Frontiers in Artificial Intelligence and Applications 171. Amsterdam: IOS Press.