Difference between revisions of "Fragility of value"

From Lesswrongwiki
Jump to: navigation, search
Line 1: Line 1:
'''Fragility of value''' is the property of human values of having an improbable and sensible organization which is very difficult to access. It is an often underestimated difficulty that exists in designing a valuable future. Only a very minor change in one aspect of them might end up with an entirely different result. For example, having all our values correctly but failing at devalue boredom might build a future replete of individuals having only one optimal pleasure experience trough all eternity.   
+
'''Fragility of value''' is the property of human values of having an improbable and sensible organization which is very difficult to access. It is an often underestimated difficulty in designing a valuable future. Only a very minor change in one aspect of them might end up with an entirely different result. For example, having all our values correctly but failing at devalue boredom might build a future replete of individuals having only one optimal pleasure experience trough all eternity.   
  
 
There are many reasons for this fragility of values. They were the product of innumerous contingent and accidental events with a great deal of random process involved: natural evolution, local optimals, the topography of design and value space, earth local conditions, human history, memetic evolution, etc. Since most of the processes that created then [http://lesswrong.com/lw/kt/evolutions_are_stupid_but_work_anyway/ weren’t very good optimization process], it’s hard for the human mind – an optimization search machine- to correctly emulate them. Since most of these values [http://lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers/ aren’t even fitness maximizers anymore], it’s hard to predict their consequences. Since we only experience some evolutionary shaped desires as first-person decontextualized unconditional emotions instead of seeing them [http://lesswrong.com/lw/l1/evolutionary_psychology/ as instrumental subgoals of reproduction], it’s hard to correctly pinpoint our unconsciousness motivations. Since in the first person mess of experiencing our values we can’t distinguish between [http://lesswrong.com/lw/l4/terminal_values_and_instrumental_values/ values that are ends and values that are means], we can’t pick out the most important and fundamental values from the accidental ones. More importantly, since we are the only intelligent value-realizators we know, we can’t access how improbable our values are in the vast possible value space; and we can’t understand how a very minor change in some of them would bring the entire [http://lesswrong.com/lw/y3/value_is_fragile/ human-value card-castle] down. These are only some of the difficulties that a single [[Eliezer Yudkowsky|researcher]] could think of in a short limited time, with no previous research by others to depart from.
 
There are many reasons for this fragility of values. They were the product of innumerous contingent and accidental events with a great deal of random process involved: natural evolution, local optimals, the topography of design and value space, earth local conditions, human history, memetic evolution, etc. Since most of the processes that created then [http://lesswrong.com/lw/kt/evolutions_are_stupid_but_work_anyway/ weren’t very good optimization process], it’s hard for the human mind – an optimization search machine- to correctly emulate them. Since most of these values [http://lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers/ aren’t even fitness maximizers anymore], it’s hard to predict their consequences. Since we only experience some evolutionary shaped desires as first-person decontextualized unconditional emotions instead of seeing them [http://lesswrong.com/lw/l1/evolutionary_psychology/ as instrumental subgoals of reproduction], it’s hard to correctly pinpoint our unconsciousness motivations. Since in the first person mess of experiencing our values we can’t distinguish between [http://lesswrong.com/lw/l4/terminal_values_and_instrumental_values/ values that are ends and values that are means], we can’t pick out the most important and fundamental values from the accidental ones. More importantly, since we are the only intelligent value-realizators we know, we can’t access how improbable our values are in the vast possible value space; and we can’t understand how a very minor change in some of them would bring the entire [http://lesswrong.com/lw/y3/value_is_fragile/ human-value card-castle] down. These are only some of the difficulties that a single [[Eliezer Yudkowsky|researcher]] could think of in a short limited time, with no previous research by others to depart from.

Revision as of 05:04, 26 September 2012

Fragility of value is the property of human values of having an improbable and sensible organization which is very difficult to access. It is an often underestimated difficulty in designing a valuable future. Only a very minor change in one aspect of them might end up with an entirely different result. For example, having all our values correctly but failing at devalue boredom might build a future replete of individuals having only one optimal pleasure experience trough all eternity.

There are many reasons for this fragility of values. They were the product of innumerous contingent and accidental events with a great deal of random process involved: natural evolution, local optimals, the topography of design and value space, earth local conditions, human history, memetic evolution, etc. Since most of the processes that created then weren’t very good optimization process, it’s hard for the human mind – an optimization search machine- to correctly emulate them. Since most of these values aren’t even fitness maximizers anymore, it’s hard to predict their consequences. Since we only experience some evolutionary shaped desires as first-person decontextualized unconditional emotions instead of seeing them as instrumental subgoals of reproduction, it’s hard to correctly pinpoint our unconsciousness motivations. Since in the first person mess of experiencing our values we can’t distinguish between values that are ends and values that are means, we can’t pick out the most important and fundamental values from the accidental ones. More importantly, since we are the only intelligent value-realizators we know, we can’t access how improbable our values are in the vast possible value space; and we can’t understand how a very minor change in some of them would bring the entire human-value card-castle down. These are only some of the difficulties that a single researcher could think of in a short limited time, with no previous research by others to depart from.

We tend to anthromorphize the future while failing to realize it will only contain what we value if we actively take actions for this improbable state to exist. Because the human brain very often fails to grasp all these difficulties, we tend to think building an awesome future is much less problematic than it really is. It already was extremely improbable to have our specific set of values instead of one of the other possible infinite permutations on the set of all possible values; to improve over those values while trying to maintain them is an even more improbable step.

Since values are orthogonal with intelligence, they can freely vary no matter how intelligent an AGI is [1]. There aren’t any specific reasons to think a superintelligent AGI would favor human values over any other set of possible values. And there are strong reasons to think a random AGI would not realize human values, since it is much more probable to pick one of the other sets from the rest of all other possible values than the specific set of human-values. A poorly constrained AGI seeking to maximize the utility experienced by humanity might turn us into blobs of perpetual orgasm. A world with immortality designed without fun theory might find its citizens modifying themselves as to find utterly engrossing a pursuit such as making table legs.

Most of human morality and values and its intricate complexity has yet to be mapped by psychology and philosophy. Wrongly designing the future because we wrongly grasped human values is a serious and difficult to access type of Existential risk. Once a future void of the correct human values comes about, there is no come back: “Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back. And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes.” [2]

See Also

Blog Posts