Test validity: a matter of taste

Are tests valid? Paul Newton thinks you might as well ask whether a Brussels sprout has integrity.

Pupils sitting a testAccording to the document that has come to be known simply as The Standards – the closest the field of educational assessment has to a statement of universal principles of good practice – validity is ‘the most fundamental consideration in developing and evaluating tests’. (AERA , APA, & NCME, 1999, p. 9). Considering that validity is such an important concept, and has been for the best part of a century, you might be forgiven for assuming that everyone who works in the field of educational assessment knows what they mean when they use the term. Unfortunately, that’s not necessarily so.

Even those who claim to know what they mean by validity are divided over the definitions they prefer. Admittedly there are a few features that almost everyone agrees upon. Most agree that: validity is a very important concept; it has something to do with measurement and/or assessment; it is a property; and it has something to do with strength or quality. Even at this level of abstraction, though, there is still room for debate. One of the most influential and authoritative treatments of validity of all time (Messick, 1989) defined validity as an ‘evaluative judgement’, an ‘inductive summary’ and an ‘evolving property’, all on the very first page!

So, does validity refer to a judgement, a summary or a property? Although most people assume that it refers to a property, there is far less agreement over what kind of property it is. It is not even clear what kinds of thing can be described as having, or not having, validity. Are tests the kind of thing that can have validity? Or does validity only apply to the uses to which test results are put? Or is validity ultimately a property of a social policy that involves testing?

You may even wonder whether any of this semantic pedantry matters. And maybe it doesn’t. Maybe all we really need is a minimum of consensus that validity is about assessment, that it’s important, and that it has something to do with the property of strength or quality. Maybe that is all we need in order to be able to say what we need to say, and for others to be able to understand what we have said?

But many would disagree with this suggestion. Many experts, over many years, have insisted that we need to be very precise in how we use the term ‘validity’ lest we encourage spurious thinking about the technical quality or social value of our tests and examinations. It is often said that tests (including exam papers) are not the kind of thing that can have validity. This is because results from a single test can be used in lots of different ways, and although they might be great for one use, they might be worthless for another. National curriculum test results might be very good for assessing a general level of attainment, against curriculum objectives, for individual pupils. But when aggregated across pupils and across subjects, they might be quite poor for judging the educational effectiveness of primary schools. From this perspective, it’s wrong to speak of ‘test validity’ because what matters is the particular interpretation that is drawn from test results and the use to which they are put. From this perspective, a test cannot have validity, any more than a Brussels sprout can have integrity; only interpretations and uses can have validity.

Some go further still. For them, validity is not simply a matter of how results are interpreted and used; it is far bigger than this. Validity needs also to embrace any intended impact, or unintended side effect, of testing. In other words, validity is not a property of a test, nor even of an interpretation or a use, but of an entire system within which tests are employed. From this perspective, validity is a property that encapsulates the overall acceptability of an assessment policy.

So, the next time you hear someone talking about validity, ask them exactly what they are talking about. And don’t be surprised if they struggle to tell you!

Paul Newton is Professor of Educational Assessment at the Institute of Education, University of London, and a member of the CERP Advisory Group. He is an author, with Stuart Shaw, of the forthcoming book Validity in Educational and Psychological Assessment which is to be published in April 2014.

  1. American Educational  Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
  2. Messick, S. (1989). Validity. In R. Linn (Ed.). Educational Measurement (3rd ed., pp.13-100). Washington, DC: American Council on Education.

Read CERP research on Contemporary validity theory and the assessment context in England.


Share this page