Value Alignment: "Acculturating" Artificial General Intelligence (AGI).

Before AGI arrives, we need to figure out how to make AI understand, adopt, and retain our goals. -- Max Tegmark (2019) in Brockman (ed.) [1]

Value alignment is not a problem new or unique to the pursuits of artificial intelligence researchers. Value alignment has long been seen as a general and ancient problem: for governance, for education, for all forms of social control. It is often generally referred to as values- or moral- or citizenship education, or even indoctrination or brainwashing or group-think.

Value Alignment? What are values? A profusion of ideas. Writings on AI behavior often tend to loosely use interchangeably the words, behavior, goals, wants, and values. In this essay I will use examples, described in English, of commonly easily observable human actions focusing on values.

Part of the problem in dealing with values is that the term "value" in used in many different contexts, not necessarily compatible among themselves, and often based on a consensus more hoped for than actual. For example,
1. Promotional (as in marketing): presuming a high degree of consensus on certain values, e.g., "Our new cars have arrived! Don't miss out on the many values we offer."

2. Explanatory (promise of effectiveness): assuming consensus on certain cause-effect relationships, e.g. "Their training will inculcate values of obedience and alacrity."

3. Focus of contention (judgement of desirability): "The contract is so vague as to be of little value in confronting problems."

4. Discursive modification (turn of phrase): creating an abstraction believed to provide a needed de-personalization for general (scientific) investigation, e.g. “A term indicating an item which someone values, valued or will (might) value we will call a “value.”

5. Theoretical: demarcating limits for analysis, e.g. “A value is a hidden variable, of which behavioral instances are indicators.” (Below, we will use a variation of examples 4 and 5.)

6. The manifestation of value pursuit or maintenance may be impeded by lack by the valuer of any of the following: rationality, knowledge, ability, circumstantial impediment, and priority.[2]

Whose value is it? What kind of value? Forty or some years ago, I was taking a graduate course in mathematics in an evening class during a winter semester. There were eight students in the course: seven who had degrees in mathematics and I, a philosophy graduate. They were there clearly enjoying themselves but for the late hours. I, interested but not enraptured, was there to accumulate credits for state certification as a mathematics teacher. After copying three chalkboards of a proof, the instructor stopped and, looking at the boards, muttered, “No, that step won’t work. Something’s missing.” After another short pause, he almost jumped into the board and wrote a formula, (I forget the formula.) and continued on with another board of proof.

I was lost. I had never seen that formula before which was clearly an identity. It equated the number “1” with some log of a function containing an imaginary argument. Up until that point, I thought I had been following the proof very closely. I looked around to my classmates to see whom I might ask later for explanation. They seemed as perplexed as I was.

One of them actually spoke up and said, “Is that a step in the proof?” The professor asked, “How many of you have taken Math 636 (or something like that). No one raised his hands. The prof continued, “Well, when you do, you will understand it completely! There are four methods to prove something: deduction, induction, seduction and intimidation. So, for now, you will have to take it on faith! By the way, which method of proof I am using here?” And he resumed filling the boards.

The discussion, among the students at the end of class after the professor had gone, was somewhat heated. The math majors, for whom the proof was valued intrinsically, I thought, were close to revolt. One said, “I didn’t major in math to take things on faith!” I, for whom the proof was mostly of extrinsic value, found it to be minorly disappointing.

But, what is a Value? What is Value Alignment? To put it most succinctly,
7. A value, V, is a constellation (a set of prioritizations) of defense and pursuit contingencies relating to V.
It is not "free-floating", i.e. "V is a value" implies "Someone values (valued or might value) V."

8. A value is not a feeling nor an episode of behavior, but a disposition to behave in certain ways under certain conditions, unless stifled by certain standard conditions of impediment. For example, valuing a cup of coffee does not mean one is now imbibing nor ever again will imbibe a cup of coffee. (One might, despite valuing coffee for the taste and for the caffeine, need to give it up for the sake of one's blood pressure.) Similarly, sleeping people have not lost their values; just their ability to manifest those values while unconscious. (See Trait vs. Behavior: the sometimes non-science of learning.)

9. A given behavior-type may under different conditions manifest itself in the pursuit, maintenance or defense of different values.

10. Values Alignment is achieved when a consensus among different parties on values and their conditions of manifestation is achieved.[3]

The Intrinsic-Extrinsic Dimensions of Value seem to mark an important boundary between humans and robotic AGI. (Perhaps not with cyborgs built from some animal nervous tissue.) Extrinsic values indicate dispositions which can be chained into a sequence, the final link of which is designated as the "intrinsic" value. This is called by some theorists a "sacred" or "fundamental" value.[4]

Humans have intrinsic values to pursue because they have basic needs. They have basic needs because they are organisms that have to find and use sources of air, water, food, shelter, etc. These basic needs have a more or less "natural" priority. Maslow offers an interesting model that he calls a "Hierarchy of Needs."

Acquiring items at the bottom level of the pyramid requires little more than one's animal abilities. But the second level up, with companionship and affiliation, already requires some value alignment, normally inculcated through random encounter or with applications or withdrawals of pain and pleasure. These encounters are often strengthened by verbal communication and evolutionarily built-in dispositions of acquisition, affiliation and their naturally accompanying emotions.

But what can substitute to establish needs, and consequently values, for AGI, lacking the capability of pain or pleasure? Needs, alone, won't do it. There is an ambiguity in the use of the word, need, which often produces a confusion between need as merely technical cause, and need as approved or promoted cause. For example, a robber may need a gun to terrorize his otherwise blasé victims; but, we, disapprovingly deny the robber's need for gun because, more fundamentally, we disapprove of robbery.

Another example is when Sam says to friend Harry, "You really ought to buy a new jacket." Harry replies, "What do I need a new jacket for? This one is just fine for when it's cold." Sam insists, "But it looks kind of worn. Shabby is not stylish anymore." Harry argues he needs the jacket for its function, an extrinsic value. Stylish clothes do not seem to be much, if at all, valued (intrinsically) by him. [5]

Of course, Maslow's model is one among many. It is likely that depending which groups you investigate you would come up with substantially different priorities. For example, Gert Hofstede identifies nationality groups according to such dimensions as Power Distance, Individualism, Assertiveness, Avoidance of Uncertainty, and Long-Short Term Orientation. It is very likely that the priorities built into Maslow's model would be substantially reorganized depending how a population is selected choosing among variants of these five of Hofstede's dimensions. [5b]

Whose Values Should AGI Be Aligned With? Who will wait for answers?
Even if we give robots the ability to learn what we want, an important question remains that AI alone won’t be able to answer. We can try to align with a person’s internal values, but there’s more than one person involved here. … How to combine people’s values when they might be in conflict is an important problem we need to solve. — Anca Dragan (2019) [6]

I have not heard nor read that the general consensus is that The Trolley Problem [7] has been solved. Nor have I seen, heard nor read that auto-makers have stopped rushing to put "self-driving" vehicles on the market, despite collateral damages involving them.

Despite the profusion of glad tidings of burgeoning crowds of marketeers, there seems to be little consensus among professional researchers and users of AI as to the safety of its present state of development.[8]

Consequently, I am even less sanguine that anyone with the capability to do so, is holding back from constructing AI, albeit it is still quite primitive or indiscriminate, for military purposes. I suspect that the fear of being caught at a military disadvantage offsets any worries about lack of value alignment with their AI adjuncts or the collateral casualties among their targets.

--- EGR


The Curse of Knowledge vs The Dunning--Kruger Effect: an instructor's dilemma

Damned if you do, damned if you don't. -- George of Trebizond (c. 1433)

Laurence J. Peter: “Look around you where you work, and pick out the people who have reached their level of incompetence. You will see that in every hierarchy the cream rises until it sours.” -- Laurence J. Peter The Peter Principle 1970

Introduction. In 1960, in the early years of the Cold War, my freshman year, I was placed in ("invited to") a course called "Honors Physics." Three days a week at 8 AM our professor, who, we were told was a renowned researcher, would come in and fill six boards with notes. Except for the two class "geniuses" who, having built a cyclotron in high school, just sat there occasionally nodding their heads, we spent all our time furiously copying. The prof, an apparently shy person, entertained no questions and was not accessible outside of class.

Help understanding the prof's notes was to be provided in small-group seminars by graduate physics assistants. We, the freshmen -- geniuses excepted and usually absented -- often couldn't understand what the prof was getting at. More than occasionally, a graduate assistant would confess the same problem. The seminars soon evolved into deciphering sessions.

Of course, we worried about our grades, since nobody -- geniuses excepted -- ever made above a 50 on the quizzes. (The geniuses regularly got 100's.) On the final, the geniuses were excused by the prof from taking the exam, The rest of us were graded, we were told, by the prof. I got a thirty-six, which seemed to be about average, so far as an informal consensus determined. No matter, the class average was a 34. So I was awarded a B. After all, we surmised, how could a class of "specially invited honors students" be flunked en masse?

Caught Between The Curse of Knowledge (COK) and The Dunning-Kruger Effect (DKE). Many of us have had a teacher or professor who, we felt, consistently "talked over our heads." This was mostly, I would hope, not done intentionally. What usually happens is that the teacher overestimates how much their students know. This happens often, if not more, as you go up the educational ladder (where it is more embarrassing to admit to ignorance). The professor exhibits this psychological bias, the Curse of Knowledge, by not taking into consideration the possibility of class disparities in prior knowledge. (College entrance exams are presumed, I suspect, to have precluded such ignorance down, even, to course-level specifics.)

Students often exhibit a bias complementary to the COK, the DKE. This is a tendency to overestimate the knowledge one has acquired from minor acquaintance with a topic or area of concern. It seems that some people, even in high places, believe that, from TV, or conversation, or reading fiction, or newspaper reports, they know as well, if not better, as anybody, -- excepting, "nerds" and "wonks" and "so-called" experts -- about, say, law, espionage, economics, politics, and diplomacy, etc.

Countering the Biases. The instructor can counter both biases to some extent by giving pretests that don't affect the students' grades. The procedure is
a. create (an often difficult procedure) and administer the pretest (See, e.g., Your Image of University Life: a course prequiz );
b. have students indicate what they think their grade will be (they can write it on the pretest;
c. grade the test and let the students see their graded papers.
d. tell them about the Dunning-Kruger Effect.

I have used this technique for years for students through high school up through doctoral education. For upper-level research writing, make the students' grades partially dependent upon their writing in critical "editors-subgroups" reviews of each other's papers.(See Examples of Papers and Critiques.)

Institutional Resistances to Trying to Counter COK and DKE biases. Sometimes students react to pre-testing by dropping out of the course. I had one such student - in a course focussed on policy evaluation -- tell me that he thought his experience as a highly trained laboratory technician would get him through "the BS of the soft-science stuff."

I had a dean worry to me that letting students on to how much they would have to learn might reduce the number of applicants in the long run.

I talked with a professor -- who really worked at accommodating his students' personal weaknesses -- who told me how his college was going to relax admissions requirements so as to increase tuition revenues. He didn't think he could deal with both increased numbers of students whose literacy and math levels were much below the already feeble standards that were traditional.

At another university of my long acquaintance, professorial folklore has it that new faculty who win recognition for excellent teaching take it to be an omen of future failure to get tenure. Trustee opinion: "If a professor spends so much time trying to engage students well, how can he or she be participating in committees and doing good research besides?"

An interesting situation is when a governor of a university exhibits DKE with respect to how educational processes proceed. At a meeting of faculty and university trustees, I heard the following comment from a trustee, "I hear that there are members of the faculty who have published books this last year. That's really good! Very productive! What I want now is to see you all publish two books next year."

To pursue these issues, see What is Worth Knowing? A Philosophical Distraction from a Problem in Leadership

Cordially, EGR

A Conundrum(?) about Empirical Knowledge and Empirical Belief.

In so far as a scientific statement speaks about reality, it must be falsifiable; and in so far as it is not falsifiable, it does not speak about reality.” -- Karl Popper

No amount of experimentation can ever prove me right; a single experiment can prove me wrong. -- Albert Einstein
BEGINNING:The Empirical Conundrum.
Original Assumption: Knowledge and Belief are not the same.
Belief may be a component of Knowledge, but Knowledge need not be a component of belief. Some beliefs are not knowledge.
Proposition A: Empirical Knowledge Claims are Defeasible (falsifiable), by virtue of their being empirical.
, i.e. defeasible means subject to withdrawal by virtue of possible (though maybe as yet unknown) countering evidence, CE. (See Three Human Dimensions of Conceptualization.)

Proposition B: Empirical Knowledge claims are, at best, Empirical beliefs.
So, it follows that
Proposition C: if something is at best an Empirical Belief. it is not Empirical Knowledge,
according to the Original Assumption.

Proposition C applies to anything that is claimed to be empirical evidence.

Thus Empirical Knowledge is not defeasible and consequently not empirical, gotten via the contrapositive of Proposition A.

Attempted Rebuttal.

But what is this CE? Is it Empirical Knowledge? If so, then by Proposition A, it is defeasible, ergo, merely Belief.

Thus, if counter-evidence CE is merely Empirical Belief; it is not really counter-evidence. Thus, Propositions B and C are gainsaid and consequently, Empirical Knowledge is, indeed, defeasible. (GO BACK TO BEGINNING.)

When you've gotten tired going repeatedly back to the Beginning, in considering the vulnerabilities of the above argument, see both
Pseudo-Science: the reasonable constraints of Empiricism
Knowledge: The Residues of Practical Caution..

Cordially, EGR

The Virtues of Hypocrisy

Indifference and hypocrisy between them keep orthodoxy alive ...
-- Israel Zangwill (1906)[1]

When you're smilin', when you're smilin'
The whole world smiles with you.
Yes, when you're laughin', when you're laughin'
The sun comes shinin' through.
But when you're cryin', you bring on the rain
So stop your sighin', be happy again.
Keep on smilin', 'cause when you're smilin',
The whole world smiles with you.[2]

Hypocrisy is consciously and deliberately saying something which you believe to be false or doing something you believe or feel is not right while maintaining a public demeanor of righteous indifference or neutral sanguinity.

If what you say is a statement, e.g. "I have returned your book to the library," when in fact you haven't, then it would be commonly called a lie. But if your statement is intended to have some sort of moral force or focusses on an act which you would reluctantly, at best, perform yourself, then it would also be considered hypocrisy, e.g. "Stop your sighing', be happy again."

Hypocrisy is generally considered undesirable, especially, in theorizing (moralizing?) if possibly mitigating circumstances are not considered. However, there are many situations which arise where disregarding concerns for hypocrisy is reasonable. Human history abounds with examples.

Two common social situations follow: the first is when a person of less power is dealing with someone threatening. For example, when importuned by persons (whether true believers or hypocrites) who are authorized, capable and willing to cause you hurt or harm.[3] Such situations can be a matter of life, death or imprisonment.

Situation I examples. Austrian Lutherans, for example, when offered the options by Holy Roman Emperor Ferdinand II (1623-1637) of emigration, death, or conversion to Roman Catholicism, pretended to convert.

The Russian Communist Party membership purge under Stalin 1929-1930 resulted in the execution of over 3,000 people and tens of thousands who lost their positions and privileges for either former membership or sympathy with those who resisted Bolshevik dominance. The "Great Purge" of 1937-39 resulted in the deaths of from 600 thousand to over one million Communist Party and government officials, and wealthy landlords. We can imagine that the victims would have felt no pricks of conscience at vehemently, though hypocritically declaring their loyalty to the Party.

German resistance to Nazism was generally unable to mobilize political opposition that would lead to a coup against Hitler. However, 77,000 German citizen resisters, presumably non-Jewish, were killed by Special Courts, or People's Courts. "Tens of thousands" more were said to be sent to concentration camps.

During the years 1947 - 1954 many Americans were pressured by US government officials and police as to whether they were communists or communist sympathizers. Rumors abounded and jail was not an unlikely outcome for many. In order to be eligible for employment as a substitute teacher in 1965, I, myself, at age 22 had to sign a document attesting to the fact that I was not, nor ever had been, a member of or sympathizer with the Communist Party of America.

I signed easily since, having had little political experience at my age, my declarations were true. If they had been false, I would have signed anyway, since I badly needed a job and I considered such documents to be irredeemably stupid and counter-productive. It seemed to me that anyone who was actually a Communist would sign it, anyway; rather than draw back and say "Sorry, I can't sign this oath for reasons of conscience." Back then in 1965, such a declaration, divulged as a lie, might just get you, not a job, but a prison sentence.

Situation II, where hypocrisy is a matter of course, occurs with a tactful inquiry, "How are you?" where the conversants lack sufficient intimacy to reveal the truth. Tact is hardly even imaginable as being other than a virtue except, say, when a normally tactful, if rather informal, question is severely out of place. For example, upon being presented to the Queen of England to receive the CBE, one extends one's hand and asks, "How's it goin', Toots?" Whatever fault we might find with this, it is not likely to be that of hypocrisy.[4]

The Values of Hypocrisy. If you are in a position of relative power or authority over some persons then your own hypocrisy (often reconceptualized as "leadership discretion") may facilitate any of the situations mentioned above: examples:
To reinforce the legitimacy of organizational or social status. [4A]

To maintain the hierarchy deriving from an ethos.[5]

To intimidate persons of inferior status to forestall defiance or resistance.[6]

As a procedural step in the initiation of military or academic recruit training.[7]

To maintain a of veneer of respectability or sanguinity.

These are no small incentives for hypocrisy. Such hypocrisy, if at all held to be morally objectionable, is likely to be considered a "Lesser Evil" than the likely consequences of "Transparency".

James G. March (1976) suggests that we would be better off with less rationality. And that we would be well served by a concept of "sensible foolishness". Freed from the constraints of pre-existent purposes, the necessity of consistency and the primacy of rationality (formal or algorithmic approaches, e.g. policies, mathematical theories.) [8]

We could use the act of intelligent choice as a planned occasion for discovering new goals, unpredicted and attractive value consequences. We become intelligently foolish by treating goals as hypotheses, intuition as real, hypocrisy as a transition, memory as an enemy and experience as theory.

For a continuation on these issues, See March, J. G. & Olsen, J. P. (1976) Ambiguity and Choice. Bergen: Universitetsforlaget. Accessible at
Rationales for Intervention: From Test to Treatment to Policy: generalizing the Rationale.


Are You a "Person of Principle"?

Transactional (in)Psychology. An interaction of an individual with one or more other persons, especially as influenced by their assumed relational roles of parent, child, or adult. --
Principles? Or Excuses? It's interesting that that the term "transactional", particularly in referring to President #45, has become his normal description. His admirers use it to explain his decision-making, finessing the issue of principle. His detractors use it to focus on (what they considers his lack of) principles.

Were the question of the title posed to you, you might respond, "It all depends on the principle," implying that there is a distance between being "principled" and being "unprincipled." I would tend to agree with you.

In the wider world, especially in this our multicultural, multi-religious, multi-political, more-or-less democratic United States of America in 2019, there are scads of principles, many of them contradicting others, or even themselves.

Do Principles Exacerbate Conflict? Despite this multi-dimensional pluralism (or, perhaps, because of it) we find many, many people who think it is important to be a principled person. They tend to believe that people who claim to be such are more trustworthy than those who are reticent to proclaim such.

Interestingly, it appears that the vaguer, the more-distant-from-everyday-goings-on the principle seems to be, the more likely it will have adherents. Such principles tend to become slogans of camaraderie, rather than moral or practical guides. This development is likely to bolster our democracy, rather than undermine it. It reduces the likelihood of conflict over specifics. Every successful negotiator recognizes this.

But What If You Consider Yourself Principled? (I personally do. I do not preach or practice, I believe, otherwise.) Then, can you answer the following questions about any of your principles?

Here are Ten Questions for Examining a Principle. Can You
1. ... state the principle (clearly)?
2. ... explain the important or critical terms in the principle?
3. ... give an example of an application of the principle?
4. ... contradict the principle?
5. ... give examples of the contradicted principle?
6. ... give reasons, if possible, for accepting the principle?
7. ... give reasons, if possible, for rejecting the principle?
8. ... tell how the principle helps us to think rationally or opens up possibilities for action?
9. ... tell how rejecting the principle restricts possibilities for thought or action?
10 ... find two other principles one more important, the other less important than the principle being evaluated?
Here are two examples of possible principles to consider:
A) An eye for an eye and a tooth for a tooth.
B) To get ahead, learn how to fake competence and passion.
Use the questions above to decide whether, and under what circumstances, you might accept or reject the principle.

For fuller explanation, and examples to practice on, see
What is it to Know How to Use a Principle?

Cordially, EGR