Before AGI arrives, we need to figure out how to make AI understand, adopt, and retain our goals. -- Max Tegmark (2019) in Brockman (ed.) [1]
Value alignment is not a problem new or unique to the pursuits of artificial intelligence researchers. Value alignment has long been seen as a general and ancient problem: for governance, for education, for all forms of social control. It is often generally referred to as values- or moral- or citizenship education, or even indoctrination or brainwashing or group-think.
Value Alignment? What are values? A profusion of ideas. Writings on AI behavior often tend to loosely use interchangeably the words, behavior, goals, wants, and values. In this essay I will use examples, described in English, of commonly easily observable human actions focusing on values.
Part of the problem in dealing with values is that the term "value" in used in many different contexts, not necessarily compatible among themselves, and often based on a consensus more hoped for than actual. For example,
1. Promotional (as in marketing): presuming a high degree of consensus on certain values, e.g., "Our new cars have arrived! Don't miss out on the many values we offer."
2. Explanatory (promise of effectiveness): assuming consensus on certain cause-effect relationships, e.g. "Their training will inculcate values of obedience and alacrity."
3. Focus of contention (judgement of desirability): "The contract is so vague as to be of little value in confronting problems."
4. Discursive modification (turn of phrase): creating an abstraction believed to provide a needed de-personalization for general (scientific) investigation, e.g. “A term indicating an item which someone values, valued or will (might) value we will call a “value.”
5. Theoretical: demarcating limits for analysis, e.g. “A value is a hidden variable, of which behavioral instances are indicators.” (Below, we will use a variation of examples 4 and 5.)
6. The manifestation of value pursuit or maintenance may be impeded by lack by the valuer of any of the following: rationality, knowledge, ability, circumstantial impediment, and priority.[2]
Whose value is it? What kind of value? Forty or some years ago, I was taking a graduate course in mathematics in an evening class during a winter semester. There were eight students in the course: seven who had degrees in mathematics and I, a philosophy graduate. They were there clearly enjoying themselves but for the late hours. I, interested but not enraptured, was there to accumulate credits for state certification as a mathematics teacher. After copying three chalkboards of a proof, the instructor stopped and, looking at the boards, muttered, “No, that step won’t work. Something’s missing.” After another short pause, he almost jumped into the board and wrote a formula, (I forget the formula.) and continued on with another board of proof.
I was lost. I had never seen that formula before which was clearly an identity. It equated the number “1” with some log of a function containing an imaginary argument. Up until that point, I thought I had been following the proof very closely. I looked around to my classmates to see whom I might ask later for explanation. They seemed as perplexed as I was.
One of them actually spoke up and said, “Is that a step in the proof?” The professor asked, “How many of you have taken Math 636 (or something like that). No one raised his hands. The prof continued, “Well, when you do, you will understand it completely! There are four methods to prove something: deduction, induction, seduction and intimidation. So, for now, you will have to take it on faith! By the way, which method of proof I am using here?” And he resumed filling the boards.
The discussion among the students at the end of class, after the professor had gone, was somewhat heated. The math majors, for whom the proof was valued intrinsically, I thought, were close to revolt. One said, “I didn’t major in math to take things on faith!” I, for whom the proof was mostly of extrinsic value, found it to be minorly disappointing.
But, what is a Value? What is Value Alignment? To put it most succinctly,
7. A value, V, is a constellation (a set of prioritizations) of defense and pursuit contingencies relating to V. (The apparent circularity here is intentional. See Concept as Abstraction. A hindrance in developing intelligence?)
It is not "free-floating", i.e. "V is a value" implies "Someone values (valued or might value) V."
8. A value is not a feeling nor an episode of behavior, but a disposition to behave in certain ways under certain conditions, unless stifled by certain standard conditions of impediment. For example, valuing a cup of coffee does not mean one is now imbibing nor ever again will imbibe a cup of coffee. (One might, despite valuing coffee for the taste and for the caffeine, need to give it up for the sake of one's blood pressure.) Similarly, sleeping people have not lost their values; just their ability to manifest those values while unconscious. (See Trait vs. Behavior: the sometimes non-science of learning.)
9. A given behavior-type may under different conditions manifest itself in the pursuit, maintenance or defense of different values.
10. Values Alignment is achieved when a consensus among different parties on values and their conditions of manifestation is achieved.[3]
The Intrinsic-Extrinsic Dimensions of Value seem to mark an important boundary between humans and robotic AGI. (Perhaps not with cyborgs built from some animal nervous tissue.) Extrinsic values indicate dispositions which can be chained into a sequence, the final link of which is designated as the "intrinsic" value. This is called by some theorists a "sacred" or "fundamental" value.[4]
Humans have intrinsic values to pursue because they have basic needs. They have basic needs because they are organisms that have to find and use sources of air, water, food, shelter, etc. These basic needs have a more or less "natural" priority. Maslow offers an interesting model that he calls a "Hierarchy of Needs."
Acquiring items at the bottom level of the pyramid requires little more than one's animal abilities. But the second level up, with companionship and affiliation, already requires some value alignment, normally inculcated through random encounter or with applications or withdrawals of pain and pleasure. These encounters are often strengthened by verbal communication and evolutionarily built-in dispositions of acquisition, affiliation and their naturally accompanying emotions.
But what can substitute to establish needs, and consequently values, for AGI, lacking the capability of pain or pleasure? Needs, alone, won't do it. There is an ambiguity in the use of the word, need, which often produces a confusion between need as merely technical cause, and need as approved or promoted cause. For example, a robber may need a gun to terrorize his otherwise blasé victims; but, we, disapprovingly deny the robber's need for gun because, more fundamentally, we disapprove of robbery.
Another example is when Sam says to friend Harry, "You really ought to buy a new jacket." Harry replies, "What do I need a new jacket for? This one is just fine for when it's cold." Sam insists, "But it looks kind of worn. Shabby is not stylish anymore." Harry argues he needs the jacket for its function, an extrinsic value. Stylish clothes do not seem to be much, if at all, valued (intrinsically) by him. [5]
Of course, Maslow's model is one among many. It is likely that depending which groups you investigate you would come up with substantially different priorities. For example, Gert Hofstede identifies nationality groups according to such dimensions as Power Distance, Individualism, Assertiveness, Avoidance of Uncertainty, and Long-Short Term Orientation. It is very likely that the priorities built into Maslow's model would be substantially reorganized depending how a population is selected choosing among variants of these five of Hofstede's dimensions. [5b]
Whose Values Should AGI Be Aligned With? Who will wait for answers?
Even if we give robots the ability to learn what we want, an important question remains that AI alone won’t be able to answer. We can try to align with a person’s internal values, but there’s more than one person involved here. … How to combine people’s values when they might be in conflict is an important problem we need to solve. — Anca Dragan (2019) [6]
I have not heard nor read that the general consensus is that The Trolley Problem [7] has been solved. Nor have I seen, heard nor read that auto-makers have stopped rushing to put "self-driving" vehicles on the market, despite collateral damages involving them.
Despite the profusion of glad tidings of burgeoning crowds of marketeers, there seems to be little consensus among professional researchers and users of AI as to the safety of its present state of development.[8]
Consequently, I am even less sanguine that anyone with the capability to do so, is holding back from constructing AI for military purposes, albeit it is still may be quite primitive or indiscriminate in its effects. I suspect that the fear of being caught at a military disadvantage offsets any worries about lack of value alignment with their AI adjuncts or about the "collateral casualties" among their targets.
See, also, Rozycki,EG (2010) The Indeterminacy of Consensus: masking ambiguity and vagueness in decision.
For more recent comment, see (Quanta Magazine, Dec 20,2022) Mitchell,M
Cordially--- EGR
FOOTNOTES
[1] Tegmark, M “Let’s Aspire to More than Making Ourselves Obsolete” p. 85 in John Brockman (ed.), Possible Minds. 25 Ways of Looking at AI. Penguin. New York. 2019.
[2] See The Conditions for Active Valuing.
[3] See Short Articles on "Sacred" Values
[4] See “Sacred Values” in US Public Schools: pretending there is no conflict.
[5] See NEEDS ASSESSMENT: A Fraud?
[5b] Hofstede G, Hofstede GJ & Mindov, M (2010) Cultures and Organizations. Intercultural Cooperation and Its Importance for Survival. New York. McGraw-Hill.
[6] Dragan, A , “Putting the Human in the AI Equation.” p.136 in Brockman. See, also, Clabaugh,GK & Rozycki,EG (2007) The Nature of Consensus.
[7] See Self-Driving Cars, Run-Away Trams, & “Unavoidable” Accidents
[8] See John Brockman (2019), throughout. Also, there is a natural tendency for the breadth of consensus on criteria to diminish, proportionally, as research becomes more intense, precise and specialized; especially when choice of criteria bears on economic or political interests.
No comments:
Post a Comment