Difference between revisions of "Complexity of value"

From Lesswrongwiki
Jump to: navigation, search
m (reworded for clarity)
 
(9 intermediate revisions by 8 users not shown)
Line 1: Line 1:
'''Complexity of value''' is the thesis that human values have high [[Kolmogorov complexity]]; that our [[preferences]], the things we care about, cannot be summed by a few simple rules, or compressed. '''Fragility of value''' is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable. For example, modeling almost all of our values correctly but failing to incorporate boredom might build a future full of individuals stuck replaying only one optimal experience through all eternity.
+
{{arbitallink|https://arbital.com/p/complexity_of_value/|Complexity of value}}
 
+
'''Complexity of value''' is the thesis that human values have high [[Kolmogorov complexity]]; that our [[preferences]], the things we care about, cannot be summed by a few simple rules, or compressed. '''[http://lesswrong.com/lw/y3/value_is_fragile/ Fragility of value]''' is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all of our values ''except'' novelty might yield a future full of individuals replaying only one optimal experience through all eternity.
These are both often underestimated difficulties in designing a valuable future. These concepts have a strong connection since the complexity of values makes them more fragile to small changes, because there are many more variables and correlations that can change their entire structure. According to the concept of Kolmogorov complexity, an easy to compress set of values can be the result of many different descriptions; hence there are fewer variations of descriptions which lead to a crucial modification.  A complex and incompressible set of values may have only one way of describing it, and any tiny modification would result in a very different set.
 
  
 
Many human choices can be compressed, by representing them by simple rules - the desire to survive produces innumerable actions and subgoals as we fulfill that desire.  But people don't ''just'' want to survive - although you can compress many human activities to that desire, you cannot compress all of human existence into it.  The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another.  William Frankena offered [http://plato.stanford.edu/entries/value-intrinsic-extrinsic/#WhaHasIntVal this list] of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):
 
Many human choices can be compressed, by representing them by simple rules - the desire to survive produces innumerable actions and subgoals as we fulfill that desire.  But people don't ''just'' want to survive - although you can compress many human activities to that desire, you cannot compress all of human existence into it.  The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another.  William Frankena offered [http://plato.stanford.edu/entries/value-intrinsic-extrinsic/#WhaHasIntVal this list] of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):
 
:"Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc."
 
:"Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc."
  
There are many reasons for this higher complexity and fragility. Our values were the product of innumerous contingent and accidental events with a great deal of random process involved: natural evolution, local optimals, the topography of design and value space, earth local conditions, human history, memetic evolution, etc. Since the human mind often search for nice, simple, efficient and neat causal explanations, it's hard to correctly emulate the random and chaotic process involved in the creation of our values[http://lesswrong.com/lw/kt/evolutions_are_stupid_but_work_anyway/]. Since most of these values aren't evolutionary adapted to our environment anymore, it's hard to predict their consequences[http://lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers]. Since most of our desires are unconsciousness evolutionary instrumental subgoals for reproduction, but we experience it as first-person decontextualized unconditional emotions [http://lesswrong.com/lw/l1/evolutionary_psychology], it is hard to correctly infer their hierarchical and functional organization. Since in the first person mess of experiencing our values we can't distinguish between values that are ends and values that are means[http://lesswrong.com/lw/l4/terminal_values_and_instrumental_values/], we can't pick out the most important and fundamental values from the accidental ones. More importantly, since we are the only intelligent value-realizators we know, we can't access how improbable our values are in the vast possible value space; and we can't understand how a very minor change in some of them would bring the entire [http://lesswrong.com/lw/y3/value_is_fragile/ human-value card-castle] down. These are only some of the difficulties that a single [[Eliezer Yudkowsky|researcher]] could think of in a short limited time, with no previous research by others to depart from.
+
Since natural selection reifies selection pressures as [[Adaptation executors|psychological drives which then continue to execute]] [http://lesswrong.com/lw/yi/the_evolutionarycognitive_boundary/ independently of any consequentialist reasoning in the organism] or that organism explicitly representing, let alone caring about, the original evolutionary context, we have no reason to expect these terminal values to be reducible to any one thing, or each other.
 +
 
 +
Taken in conjunction with another LessWrong claim, that all values are morally relevant, this would suggest that those philosophers who seek to do so are mistaken in trying to find cognitively tractable overarching principles of ethics. However, it is coherent to suppose that not all values are morally relevant, and that the morally relevant ones form a tractable subset.
 +
 
 +
Complexity of value also runs into underappreciation in the presence of bad [[metaethics]].  The local flavor of metaethics could be characterized as cognitivist, without implying "thick" notions of instrumental rationality; in other words, moral discourse can be about a coherent subject matter, without all possible minds and agents necessarily finding truths about that subject matter to be psychologically compelling.  An [[paperclip maximizer|expected paperclip maximizer]] doesn't disagree with you about morality any more than you disagree with it about "which action leads to the greatest number of expected paperclips", it is just constructed to find the latter subject matter psychologically compelling but not the former. Failure to appreciate that "But it's just paperclips!  What a dumb goal!  No sufficiently intelligent agent would pick such a dumb goal!" is a judgment carried out on a local brain that evaluates paperclips as inherently low-in-the-preference-ordering means that someone will expect all moral judgments to be automatically reproduced in a sufficiently intelligent agent, since, after all, they would not lack the intelligence to see that paperclips are so obviously inherently-low-in-the-preference-ordering. This is a particularly subtle species of [[anthropomorphism]] and [[mind projection fallacy]].
  
We tend to [http://wiki.lesswrong.com/wiki/Anthropomorphism anthromorphize the future] while failing to realize it will only contain what we value if we actively take actions for this improbable state to exist. Because the human brain very often fails to grasp all these difficulties involving our values, we tend to think building an awesome future is much less problematic than it really is. It already was extremely improbable to have our specific set of values instead of one of the other possible infinite permutations on the set of all possible values; to improve over those values while trying to maintain them is an even more improbable step.
+
Because the human brain very often fails to grasp all these difficulties involving our values, we tend to think building an awesome future is much less problematic than it really is. Fragility of value is relevant for building [[Friendly AI]], because an [[AGI]] which does not respect human values is likely to create a world that we would consider devoid of value - not necessarily full of explicit attempts to be evil, but perhaps just a dull, boring loss.
  
Fragility of value is relevant for building [[Friendly AI]], because an [[AGI]] which does not respect human values would attempt to create a world that we would consider undesirable. As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [http://www.nickbostrom.com/superintelligentwill.pdf]. There aren't any specific reasons to think a superintelligent AGI would favor human values over any other set of possible values. And there are strong reasons to think a random AGI would not realize human values, since it is much more probable to pick one of the other sets from the rest of all other possible values than the specific set of human-values. A poorly constrained AGI seeking to maximize the [[utility]] experienced by humanity might turn us into blobs of perpetual orgasm. A world with immortality designed without [[fun theory]] might find its citizens modifying themselves as to find utterly engrossing a pursuit such as making table legs. Because of this relevance the complexity and fragility of value is a major theme of [[Eliezer Yudkowsky]]'s writings.
+
As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [http://www.nickbostrom.com/superintelligentwill.pdf]. Since human / humane values have high Kolmogorov complexity, a random AGI is highly unlikely to maximize human / humane values. The fragility of value thesis implies that a poorly constructed AGI might e.g. turn us into blobs of perpetual orgasm. Because of this relevance the complexity and fragility of value is a major theme of [[Eliezer Yudkowsky]]'s writings.
  
Most of human morality and values and its intricate complexity has yet to be mapped by psychology and philosophy. Wrongly designing the future because we wrongly grasped human values is a serious and difficult to access type of [[Existential risk]]. Once a future void of the correct human values comes about, there is no come back:  "Touch too hard in the wrong dimension, and the physical representation of those values will shatter - ''and not come back, for there will be nothing left to want to bring it back''. And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes." [http://lesswrong.com/lw/y3/value_is_fragile/]
+
Wrongly designing the future because we wrongly encoded human values is a serious and difficult to assess type of [[Existential risk]]. "Touch too hard in the wrong dimension, and the physical representation of those values will shatter - ''and not come back, for there will be nothing left to want to bring it back''. And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes." [http://lesswrong.com/lw/y3/value_is_fragile/]
  
 
==Major posts==
 
==Major posts==
Line 28: Line 31:
 
*[http://lesswrong.com/lw/1oj/complexity_of_value_complexity_of_outcome/ Complexity of Value ≠ Complexity of Outcome] by [http://weidai.com/ Wei Dai]
 
*[http://lesswrong.com/lw/1oj/complexity_of_value_complexity_of_outcome/ Complexity of Value ≠ Complexity of Outcome] by [http://weidai.com/ Wei Dai]
 
*[http://lesswrong.com/lw/65w/not_for_the_sake_of_pleasure_alone/ Not for the Sake of Pleasure Alone] by [http://lukeprog.com/ lukeprog]
 
*[http://lesswrong.com/lw/65w/not_for_the_sake_of_pleasure_alone/ Not for the Sake of Pleasure Alone] by [http://lukeprog.com/ lukeprog]
 +
*[https://casparoesterheld.com/2017/02/10/a-non-comprehensive-list-of-human-values/ A Non-Comprehensive List of Human Values]
  
 
==See also==
 
==See also==

Latest revision as of 22:10, 31 August 2018

Arbital has an article about

Complexity of value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all of our values except novelty might yield a future full of individuals replaying only one optimal experience through all eternity.

Many human choices can be compressed, by representing them by simple rules - the desire to survive produces innumerable actions and subgoals as we fulfill that desire. But people don't just want to survive - although you can compress many human activities to that desire, you cannot compress all of human existence into it. The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another. William Frankena offered this list of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):

"Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc."

Since natural selection reifies selection pressures as psychological drives which then continue to execute independently of any consequentialist reasoning in the organism or that organism explicitly representing, let alone caring about, the original evolutionary context, we have no reason to expect these terminal values to be reducible to any one thing, or each other.

Taken in conjunction with another LessWrong claim, that all values are morally relevant, this would suggest that those philosophers who seek to do so are mistaken in trying to find cognitively tractable overarching principles of ethics. However, it is coherent to suppose that not all values are morally relevant, and that the morally relevant ones form a tractable subset.

Complexity of value also runs into underappreciation in the presence of bad metaethics. The local flavor of metaethics could be characterized as cognitivist, without implying "thick" notions of instrumental rationality; in other words, moral discourse can be about a coherent subject matter, without all possible minds and agents necessarily finding truths about that subject matter to be psychologically compelling. An expected paperclip maximizer doesn't disagree with you about morality any more than you disagree with it about "which action leads to the greatest number of expected paperclips", it is just constructed to find the latter subject matter psychologically compelling but not the former. Failure to appreciate that "But it's just paperclips! What a dumb goal! No sufficiently intelligent agent would pick such a dumb goal!" is a judgment carried out on a local brain that evaluates paperclips as inherently low-in-the-preference-ordering means that someone will expect all moral judgments to be automatically reproduced in a sufficiently intelligent agent, since, after all, they would not lack the intelligence to see that paperclips are so obviously inherently-low-in-the-preference-ordering. This is a particularly subtle species of anthropomorphism and mind projection fallacy.

Because the human brain very often fails to grasp all these difficulties involving our values, we tend to think building an awesome future is much less problematic than it really is. Fragility of value is relevant for building Friendly AI, because an AGI which does not respect human values is likely to create a world that we would consider devoid of value - not necessarily full of explicit attempts to be evil, but perhaps just a dull, boring loss.

As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [1]. Since human / humane values have high Kolmogorov complexity, a random AGI is highly unlikely to maximize human / humane values. The fragility of value thesis implies that a poorly constructed AGI might e.g. turn us into blobs of perpetual orgasm. Because of this relevance the complexity and fragility of value is a major theme of Eliezer Yudkowsky's writings.

Wrongly designing the future because we wrongly encoded human values is a serious and difficult to assess type of Existential risk. "Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back. And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes." [2]

Major posts

Other posts

See also