Sunday, August 9, 2015

Charades of Evaluation: mis-connecting cause and effect

updated 4/23/19
"…passing a failing student is the #1 worst thing a teacher can do. … Changing grades is the most undermining contribution to a student’s failure, but above all else – it invalidates your data. Putting aside creating and submitting inaccurate school data for the moment, entering a 'false grade' will make it virtually impossible to reliably measure any improvement of your skills as a teacher. Your improvement will now be based on unsound and worthless data." -- M. Cubbin (8/2/15) The Business of School

"… Is the Customer Always Right?" -- Farrington, Frank (1915) in Merck Report, Volume 24 pg 134-135
Pseudo-Evaluation. Data are not fundamental. On their face, data cannot be distinguished from outputs of a random number generator. Data are like shell chips on a beach mixed in with sand; or, like foam on the tide. (For an article relating “data” and “objectivity,” see Can Criminal or Immoral Behavior Be Dealt With Objectively?)

Far more important to know is what the processes are by which the putative data are collected. And even more critical is knowing which theories connect the data and collection process to what supposedly they indicate.

Much “data-collection” is like trying to identify proportions of bird-species during a fall migration. If the process were merely to tally varieties of southward flying objects, we might well end up confounding red-winged blackbirds with jet planes and monarch butterflies. (See Is It Really a Test? Or Just Another Task?)

In the above epigraph, Cubbin presumes a connection, presumably ideally possible, between school grades, student failure, and teacher skill. Using teachers as graders cuts costs, but is begging for inconsistency: not, because teachers may not “know their stuff.” But, because many institutional processes can overrule even the best of teachers’ judgments, e.g. administors’ prerogative, special education policies, or political involvement in the grading process. There is little consensus, when interest-group push comes to shove, on either goals or concepts appropriate to education. (See What Does a Consensus Mean, Anyway?" )

Requiring teachers to grade then grading the teachers is like judging baseball coaches using their players' batting averages. There will likely be only a tenuous, if any, connection between the data and any causal relationship to coaching (teaching) efforts. (See Power Failure: Losing the Series; Blaming the Bat Boys )

The Diploma-Holder Markets: is the customer always right? An important assumption Cubbin seems to make is that markets for test-passers are comprised of persons looking for those who possess certain proven skills. This is only a minor proportion of the markets for certificate- or diploma-holders.

Consider these other markets for whom actual skill levels are a distant, if even, a second consideration to applicant grade-point average:
a. Colleges, public, private or commercial, who have external, e.g. federal, or foundational, funds available for applicants with a certain grade-point average -- especially if the recipient institutions have tight budgets;

b. Institutions legally required to have certified staff but faced with employee scarcities, e.g. hospitals, clinics, civil-service;

c. School districts needing both adults certified as teachers, and students with birth and health certificates, in order to be run at even somewhat remove from peak efficiency;

d. Government administrations pursuing certain public policy initiatives that depend on items a, b and c, preceding; e.g. special education, affirmative action, STEM (Science, Technology & Mathematics); and last but not least,

e. the children applicants, legacies, to colleges which favor (paying) parents who are past graduates.

If skills really counted, there would be something like board standard examinations to be passed; normally, to be retaken at standard intervals. Teacher grades would not be accepted in place of board exam results. (See The Dangers of Diplomas)

But where the sheepskin alone is most important, rarely will the sheep’s diet be.

For examples and to pursue the issues raised in this essay, see

1. “Data-Driven”: a slogan to distract from organizational disagreement?;

2. Classification Error in Evaluation Practice:
the impact of the "false positive" on educational practice and policy

--- EGR