Pearson













Subscribe to SLP eNEWS - our FREE monthly eNewsletter. Enter your e-mail address:





Home | Clinical Café Archive | August 2007

“Testing 101”

August 2007 Clinical Café
By Jeff Evans, MS, CCC-SLP

This month’s Clinical Café is a “back to the basics” discussion of common and often-discussed test types as well as the important concepts of reliability and validity. For new and veteran test users alike, you may find easy ways below to describe these ideas to others…and to refresh your own memory!

To begin, a standardized test is a test administered and scored in a standard manner. These tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner. "Standardized" may also refer to the reference of the score that a test-taker receives (i.e., a standard score).

Generally, there are two types of standardized tests: norm-referenced tests and criterion-referenced tests, resulting in a norm-referenced score or a criterion-referenced score, respectively. Norm-referenced scores compare test-takers to a group of same-age or same-grade peers. Criterion-referenced scores compare test-takers to a content performance level (i.e., a criterion), and may also be described as standards-based or curriculum-based assessment. Norm-referenced tests measure success by rank ordering students, while standards-based assessments allow that all students may score highly if they meet stated standards. Let’s look at each in a more in-depth way.

Norm-Referenced Tests (NRTs)

A norm-referenced test (NRT) compares an individual to a sample of his or her peers, referred to as a "normative sample." NRTs compare test takers to each other. NRTs are designed to "rank-order" test takers—that is, to compare students' scores. A norm-referenced test does not compare all the students who take the test in a given year. Instead, test developers select a subset of individuals (e.g., 50 ninth graders in 30 different states) from the target population (i.e., all ninth graders in the nation). The test is "normed" on this subset to fairly represent the entire target population—that is, the full range of “normal students.” The scores that you generate from individuals you test (e.g., ninth graders at your local high school) are then reported in relation to the scores of this "norming" group.

To make comparing scores easier, test developers often want results that look somewhat like a bell-shaped curve (i.e., the "normal" curve, shown in the diagram below). Most students will score near the middle, and some will score low (the left side of the curve) or high (the right side of the curve). Scores are usually reported as percentile ranks or standard scores. The scores range from 1st percentile to 99th percentile, with the average student score set at the 50th percentile. For example, if Steve scored at the 63rd percentile, it means he scored higher than 63% of the test takers in the norming group. It would also mean that Steve’s 63rd percentile rank equals a standard score of 105.  With standard scores average, or 50th percentile, always equals 100.  Scores also can be reported as "grade equivalents," "stanines," or "normal curve equivalents." Some scores are derived from raw scores, and others are derived from standard scores.

The “bell curve” assumes a normal distribution of scores. A perfect curve never occurs, but if you sample enough people during norms development the whole group may give a result that is very close to this graphical profile.

Source: Dunn, L. M., Dunn, D. M. (2007). Manual: Peabody Picture Vocabulary Test, fourth edition. Bloomington, MN: Pearson Assessments.

Criterion-referenced tests (CRTs)

A criterion-referenced test is intended to measure how well a person has learned a specific body of knowledge and associated skills. Multiple-choice tests most people take to get a driver's license and on-the-road driving tests are both examples of criterion-referenced tests. As on most other CRTs, it is possible for everyone to earn a passing score (e.g., 90% or better) if they know about driving rules and if they drive reasonably well.  Educators are concerned with students achieving passing scores on statewide standards.  In these kinds of tests there is an agreed upon set of criteria, and students are expected to score at a specified minimum level to pass.  Curriculum performance goals are another kind of CRT. To advance to the next learning packet, for example, the student must achieve 70% or better on the post test.

Testing with Reliability and Validity

Test reliability refers to the degree to which a test is consistent and stable in measuring what it is intended to measure. Reliability is a statistical estimation.  Most simply put, a test is reliable if it is consistent within itself and across time. To understand the basics of test reliability, think of a bathroom scale that gave you drastically different readings every time you stepped on it regardless of whether your had gained or lost weight. If such a scale existed, it would not be considered reliable.

Test validity refers to the degree to which the test actually measures what it claims to measure. Test validity is also the extent to which inferences, conclusions, and decisions made on the basis of test scores are appropriate and meaningful.

See the September 2006 Clinical Café article “The Validities” for some great examples of test validity.

The relationship of reliability and validity is straightforward. Test validity is required for a test to be considered reliable. If a test is not valid, then it cannot be reliable. And the converse is also true; if a test is not reliable it is also not valid.

All This Science...

Those of us who entered into a profession that favors highly verbal characteristics in people sometimes struggle with the quantitative aspects of research science.  We desire success in the clinical setting and over time we realize that art and science seem to meld together in practice. We are partly scientists! So, boring as it seems sometimes, stats and test theory are good for us to review; hopefully this refresher is useful for your practice.

References

Dunn, L. M., Dunn, D. M. (2007). Manual: Peabody Picture Vocabulary Test, fourth edition. Bloomington, MN: Pearson Assessments.
 






Pearson AGS Assessments are now part of Pearson's Assessment group,
Phone: 800.627.7271    |    Inquiries: pearsonassessments@pearson.com

ASHA Partnership
© 2005-2007, Pearson Education or its affiliates. All rights reserved.
Privacy Policy    |    Terms & Conditions