Skip Navigation

The Use of Tests When Making High Stakes Decisions for Students - Test Measurement Principles

This KnowledgeBase archive includes content and external links that were accurate and relevant as of September 30, 2019.

A Resource Guide for Educators and Policy-Makers
Prepared by the Office for Civil Rights, U.S. Department of Education

CHAPTER 1: Test Measurement Principles

This chapter explains basic test measurement standards and related educational principles for determining whether tests used as part of making high-stakes decisions for students provide accurate and fair information.

In using tests as part of high-stakes decision-making, educational institutions should ensure that the test will provide accurate results that are valid, reliable and fair for all test takers.

The Testing of All Students: Issues of Intervention and Inclusion

All aspects of validity, reliability, fairness and cut scores discussed above are applicable to the measurement of knowledge and skills of all students, including limited English proficient students and students with disabilities. This section addresses additional issues related to accurately measuring the knowledge and skills of these two populations in selected situations. Issues affecting limited English proficient and disabled students are addressed separately below following discussion of general considerations about the selection and use of accommodations.

Whenever tests are intended to evaluate the knowledge of skills of different groups of students, ensuring that test score inferences accurately reflect the intended constructs for all students is a complex task. It involves several aspects of test construction, pilot testing, implementation, analysis and reporting. For limited English proficient students and students with disabilities, the appropriate inclusion of students from these groups in validation and norming samples, and the meaningful inclusion of limited English proficient and disability experts throughout the test development process, are necessary to ensure suitable test quality for these groups of test takers.

The proper inclusion of diverse groups of students in the same academic achievement testing program helps to ensure that high-stakes decisions are made on the basis of test results that are as comparable as possible across all groups of test takers. If different tests are used as part of the testing program, it is important to ensure that they measure the same content standards. The appropriate inclusion of students can also help to ensure that educational benefits attributable to the high-stakes decisions will be available to all. In some cases, it is appropriate to test limited English proficient students and students with disabilities under standardized conditions, as long as the evidence supports the validity of the results in a given situation for these students. In other cases, the conditions may have to be accommodated to assure that the inferences of the scores validly reflect the students' mastery of the intended constructs. The use of multiple measures generally enhances the accuracy of the educational decisions, and these measures can be used to confirm the validity of the test results. The use of multiple measures is particularly relevant for limited English proficient students and students with disabilities in cases where technical data are in the process of being collected on the proper use of accommodations and the proper interpretation of test results when testing conditions are accommodated.

A. General Considerations about Accommodations

Making similar inferences about scores from academic achievement tests for all test takers, and making appropriate decisions when using these scores, requires accurately measuring the same academic constructs (knowledge and skills in specific subject areas) across groups and contexts. In measuring the knowledge and skills of limited English proficient students and students with disabilities, it is particularly important that the tests actually measure the intended knowledge and skills and not factors that are extraneous to the intended construct. For instance, impaired visual capacity may influence a student's test score in science when the student must sight read a typical paper and pencil science test. In measuring science skills, the student's sight likely is not relevant to the student's knowledge of science. Similarly, how well a limited English proficient student reads English may influence the student's test score in mathematics when the student must read the test. In this case, the student's reading skills likely are not relevant when the skills of mathematics computation are to be measured. The proper selection of accommodations for individual students and the determination of technical quality associated with accommodated test scores are complex and challenging issues that need to be addressed by educators, policy-makers, and test developers.

Typically, accommodations to established conditions are found in three main phases of testing: 1) the administration of tests, 2) how students are allowed to respond to the items and 3) the presentation of the tests (how the items are presented to the students on the test instrument). Administration accommodations involve setting and timing, and can include extended time to counteract the increased literacy demands for English language learners or fatigue for a student with sensory disabilities. Response accommodations allow students to demonstrate what they know in different ways, such as responding on a computer rather than in a test booklet. Presentation accommodations can include format variations such as fewer items per page, large print and plain language editing procedures, which use short sentences, common words and active voice. There is wide variation in the types of accommodations used across states and school districts. (Appendix C lists many of the accommodations used in large-scale testing for limited English proficient students and students with disabilities. The list is not meant to be exhaustive, and its use in this document should not be seen as an endorsement of any specific accommodations. Rather, the Appendix is meant to provide examples of the types of accommodations that are being used with limited English proficient students and students with disabilities.)

Issues regarding the use of accommodations are complex. When the possible use of an accommodation for a student is being considered, two questions should be examined: 1) What is being measured if conditions are accommodated? 2) What is being measured if the conditions remain the same? The decision to use an accommodation or not should be grounded in the ultimate goal of collecting test information that accurately and fairly represents the knowledge and skills of the individual student on the intended constructs. The overarching concern should be that test score inferences accurately reflect the intended constructs rather than factors extraneous to the intent of the measurement.

B. Testing of Limited English Proficient Students

The Joint Standards and several recent measurement publications discuss the population of limited English proficient students and how test publishers and users have handled inclusion in tests to date. This section briefly outlines principles derived from the Joint Standards and these publications. It addresses two types of testing situations especially relevant for limited English proficient students: the assessment of English language proficiency and the assessment of academic educational achievement.

1. Assessing English Language Proficiency

Issues of validity, reliability and fairness apply to tests and other relevant assessments that measure English language proficiency. English language proficiency is typically defined as proficiency in listening, speaking, reading and writing English. Assessments that measure English language proficiency are generally used to make decisions about who should receive English language acquisition services, the type of programs in which these students are placed and the progress of students in the appropriate programs. They are also used to evaluate the English proficiency of students when exiting from a program or services, to ensure that they can successfully participate in the regular school curriculum. In making decisions about which tests are appropriate, it is particularly important to make sure that the tests accurately and completely reflect the intended English language proficiency constructs, so the students are not misclassified. It is generally accepted that an evaluation of a range of communicative abilities will typically need to be assessed when placement decisions are being made.

2. Assessing the Academic Educational Achievement of Limited English Proficient Students

Several factors typically affect how well the educational achievement of limited English proficient students is measured on standardized academic achievement tests. Technical issues associated with developing meaningful achievement tests for limited English proficient students can be complex and challenging. For all test takers, any test that employs written or oral skills in English or in another language is, in part, a measure of those skills in the particular language. Test use with individuals who have not sufficiently acquired the literacy or fluency skills in the language of the test may introduce construct-irrelevant components to the testing process. Further, issues related to differences in the experiences of students may substantially affect how test items are interpreted by different groups of students. In both instances, test scores may not accurately reflect the qualities and competencies that the test intends to measure.

a. Background Factors for Limited English Proficient Students

The background factors particularly salient in ensuring accuracy in testing for students with limited English proficiency tend to relate to language proficiency, culture and schooling.

Limited English proficient students often bring varying levels of English and home-language fluency and literacy skills to the testing situation. These students may be adept in conversing orally in their home language, but unless they have had formal schooling in their home language, they may not have a corresponding level of literacy. Also, while students with limited English proficiency may acquire a degree of fluency in English, literacy in English for many students comes later. To add to the complexity, proficiency in fluency and literacy in either the home language or English involves both social and academic components. Thus, a student may be able to write a well-organized social letter in his or her home language and may not be able to orally explain adequately in that language how to solve a mathematics problem that includes the knowledge of concepts and words endemic to the field of mathematics. The same phenomena may occur in English as well.

b. Including Limited English Proficient Students in Large-Scale Standardized Achievement Tests

The Joint Standards recognizes the complexity of developing educational achievement tests that are appropriate for a range of test takers, including those who are limited English proficient. Overall, "testing practice should be designed to reduce threats to the reliability and validity of test score inferences that may arise from language differences." When credible research evidence reports that scores may differ in meaning across subgroups of linguistically diverse test takers, then, to the extent feasible, the same form of validity evidence should be collected for each relevant subgroup as for the examinee population as a whole. The Joint Standards states, "When a test is recommended for use with linguistically diverse test takers, test developers and publishers should provide the information necessary for appropriate test use and interpretation." Furthermore, "when testing an examinee proficient in two or more languages for which the test is available, the examinee's relative language proficiencies should be determined. The test generally should be administered in the test taker's most proficient language, unless proficiency in the less proficient language is part of the assessment." Recommended accommodations should be used appropriately and described in detail in the test manual; translation methods and interpreter expertise should be clearly described; evidence of test comparability should be reported when multiple language versions of a test are intended to be comparable; and evidence of the score reliability and the validity of the translated test's score inferences should be provided for the intended uses and linguistic groups.

Providing accommodations to established testing conditions for some students with limited English proficiency may be appropriate when their use would yield the most valid scores on the intended academic achievement constructs. Deciding which accommodations to use for which students usually involves an understanding of which construct irrelevant background factors would substantially influence the measurement of intended knowledge and skills for individual students, and if the accommodations would enhance the validity of the test score interpretations for these students. In collecting evidence to support the technical quality of a test for limited English proficient students, the accumulation of data may need to occur over several test administrations to ensure sufficient sample sizes. Educators and policy-makers need to understand that the proper use of accommodations for limited English proficient students and the determination of technical quality are complex and challenging endeavors.

Appendix C lists various test presentation, administration, and response accommodations that states and districts generally employ when testing limited English proficient students. Examples of accommodations in the presentation of the test include editing text so the items are in plain language, or providing page formats which minimize confusion by limiting use of columns and the number of items per page. Presenting the test in the student's native language is an accommodation to a test written in English when the same constructs are being measured on both the English- and native-language versions. It is essential that translations accurately convey the meaning of the test items; poor translations can prove more harmful than helpful. Administration accommodations include extending the length of the testing period, permitting breaks, administering tests in small groups or in separate rooms, and allowing English or native-language glossaries or dictionaries as appropriate. Response accommodations include oral response and permitting students to respond in their native language.

Factors Related to Accurately Testing LEP Students

Language Proficiency

  • The student's level of oral and written proficiency in English
  • The student's proficiency in his or her home language
  • The language of instruction

Cultural Issues

  • Background experiences
  • Perceptions of prior experiences
  • Value systems

Schooling Issues

  • The amount of formal elementary and secondary schooling in the student's home country, if applicable, and in U.S. schools
  • Consistency of schooling
  • Instructional practices in the classroom

Therefore, in determining how to effectively measure the academic knowledge and skills of limited English proficient students, educators and policy-makers should consider how to minimize the influence of literacy issues, except when these constructs are explicitly being measured. The levels of proficiency of limited English proficient students in their home language and in English, as well as the language of instruction, are important in determining in which language an achievement test should be administered, and which accommodations to standardized testing conditions, if any, might be most useful for which students.

Additionally, diverse cultural and other background experiences, including variations in amount, type and location (home country and United States) of formal elementary and secondary schooling, as well as interrupted and multi-location schooling of students (of the type frequently experienced by children of migrant workers), affect language literacy, the contextual content of items and the academic foundational knowledge base that can be assumed in appropriately interpreting the results of educational achievement tests. The format and procedures involved in testing can also affect accuracy in test scores, particularly if the test practices differ substantially from ongoing instructional practices in classrooms, including which accommodations are used in the classroom and how they are used.

Source:

Office for Civil Rights, U.S. Department of Education

The complete text of this document can be found at: http://www.ed.gov/offices/OCR/archives/pdf/TestingResource.pdf.

The contents of this website were developed under a grant from the U.S. Department of Education and are intended for general reference purposes only. However, those contents do not necessarily represent the policy of the U.S. Department of Education or the Center, and you should not assume endorsement by the Federal Government. Some resources on this site require Adobe Acrobat Reader. This website archive includes content and external links that were accurate and relevant as of September 30, 2019.