Ohio State Navbar

Evidence-centered design (ECD) of tests

Update (07/18/2014): An excellent reason to become familiar with ECD is that the long-awaited revision of the 1999 Standards for Educational and Psychological Testing has been published by the American Educational Research Association and is built largely around the validation model. The revision continues as the “gold standard” of guidance for testing in the United States and around the world and often drives revisions and updates in other standards (e.g., Principles for Validation and Use of Personnel Selection Procedures, NCCA’s Standards for the Accreditation of Certification Programs). There will be a lag, naturally, as those other documents are revised but they can be expected within three to five years.

Original Article: Validity in testing and assessment is the most important standard for evaluating quality. Both the 1999 and 2014 versions of the Standards for Educational and Psychological Testing converge on this assertion. Validation refers to activities designed to provide conceptual and empirical evidence for stated score interpretations, which should be a collaborative effort between test developers and test score users. A typical score interpretation is that a student test taker has either demonstrated or has failed to demonstrate mastery of a tested domain. Perkins legislation that governs career-technical education (CTE) funding specifies that tests should be reliable and valid (e.g., tests intended to measure technical skill attainment [Indicator 2S1]). The purpose of this brief, aimed at career-technical educators and other trainers, is to present specifics, implications, and selected applications of evidence-centered design (ECD) in large-scale testing. ECD refers to a model for developing tests in which validity is co-developed as an argument to support score interpretations.

A substantial body of work on ECD exists. First, chapters in the 2012 Handbook on Measurement, Assessment, and Evaluation in Higher Education, one by Yarnall and Ostrander and the other by Haertel, Wentland, Yarnall, and Mislevy, provide recent statements on ECD that are pertinent for occupational-technical testing. Yarnall and Ostrander provide an example of scenario-based learning for postsecondary technician education. This instructional model is being implemented as part of the National Science Foundation’s Advanced Technological Education (ATE) program. The authors detail how ECD was applied across multiple information technology and engineering disciplines using assessment reflection to assist instructors in creating various types of formative and summative assessments. Haertel and colleagues discussed ECD in relation to project work with a national association of trainers of commercial vehicle drivers. One of their extended examples was test development for earning the commercial driver’s license (CDL) as a credential. These two projects show the promise of ECD in education/training systems that use testing and assessment to monitor and evaluate individuals and programs.

Other articles by Tannenbaum, Robustelli, and Baron (2008) and by Williamson, Mislevy, and Almond (2004) help to demonstrate the application of ECD in credentialing examinations (licensure/certification). Tannenbaum et al. (2008) presented side-by-side processes for job analysis and ECD in credentialing, incorporating the traditional sequence of an expert committee, verification surveys, committee processing of survey results, and translation into test specifications. Williamson et al. (2004) made a persuasive argument for the application of ECD in credentialing. The argument that is developed and supported in credentialing, according to Williamson and colleagues as well as others, is that persons scoring below a cutoff are unlikely to provide safe services to the public while those scoring at or above the cutoff are qualified to enter the profession or trade. The model inherent in defining cutoffs is a contrast of knowledge, skill, and judgment between those judged to be proficient and those judged non-proficient. There is no presumption of job or career success in testing for credentials. Lastly, there is ample evidence of application of ECD in large-scale testing, such as in the redesign of the College Board’s Advanced Placement testing system (Bejar, 2010).

ECD Layers and Emphases

Historically, Mislevy proposed ECD with colleagues by beginning with the work of Messick (1992, 1995). Messick, well known for adding a consequential basis of validity, had argued that validation as argument is always based on imperfect evidence. Mislevy and colleagues proposed sequential and layered methods for developing the validity argument alongside test creation (Mislevy, Steinberg, & Almond, 2003). Parallel processing, therefore, represents one innovation from evidence-centered design. Layers, another feature of the ECD model, were drawn from architectural and computer systems logic. The table below shows the five layers of ECD together with definitions and procedures aligned to relevant aspects of CTE testing.

Layer Definition/Steps/Tools CTE Representations
Domain Analysis

A first step is acquiring/generating information about the domain of interest: concepts, terms, tools, models and representations, key situations of use, and patterns of interaction with clients/peers.

Consider, for instance, an IT career field with a network systems pathway (set of related occupations). There are postsecondary and non-postsecondary trajectories within careers. The Cisco Networking Academy represents a paradigm case of innovative instructional delivery that uses ECD extensively.

Represent career fields, pathways, or occupations (the domain), technical content standards or skill standards (state or national sources) as well as artifacts, situations, and interaction patterns of the particular domain

Implement using the workflow in Tannenbaum et al. (2008), spanning input from broadly based committees, surveys of additional stakeholders, and drilldown behavioral and cognitive task analyses.

Domain Modeling

Following domain analysis, modeling creates broader representations of a universe of content. These represent a transition from qualitative and quantitative data to conceptual models that are practical with industry input and participation.

Tools for domain modeling might be diagrams of assessment arguments (Kane, 1992), templates for assessment arguments based on big ideas, or crosscutting concepts of domains (as in the Next Generation Science Standards), or design patterns.

Continuing with the Cisco Network Academy, Behrens and colleagues (2010) presented an extensive write-up of applications of ECD to learning/testing in a virtual world. Blended employability concepts, resurrected as 21st-century skills and synthesized by a group convened by the National Research Council, can be assessed in virtual testing environments using ECD by adding social and social-technical constructs.

Create statements — test purposes (to lay out score interpretations) and blueprints — at a broad level for refinement in another layer. (See Conceptual Assessment Foundation below)

Align content-tests using systems proposed by Bloom, Marzano, Webb, or Hess (Cognitive Rigor Model). Begin to consider the non-proficient compared to the proficient test taker.

Workflow procedures might include the Global Skills Xchange (GSX) approach to alignment, various types of test plans/purposes/blueprints developed collaboratively by stakeholders, and item specifications or exemplars to guide item writing as part of assessment implementation.

Conceptual Assessment Foundation

Craft the assessment argument in structures/specifications for tasks and tests, evaluation procedures, and measurement models.

Models include: task, student, evidence (which connects task and student models).

The Cisco Networking Academy program, a philanthropic outreach program, uses CNS (Computer Network System).

Add specificity to CTE-related domain models from the previous layer, because they were stated very generally but must be translated across layers.

An example might be competitive events for a state/national integrated plan by career and technical student organizations (CTSOs). An assessment reflection model such as that presented by Yarnall and Ostrander could be conducted at a distance, using web conferencing or collaboration websites/software.

Products include specifications for:

  • proficient test takers in terms of knowledge and skill (Performance level descriptions are one way to represent the test takers.)
  • task features, fixed or variable (Yarnall and Ostrander discuss how variable features can be used to dial up or down the difficulty of a task.)
  • evidence models that connect student and task models
Assessment Implementation

Deploy tests or assessments including presentation-ready tasks, calibrated measurement models for obtaining evidence from test-taker responses.

Item banks are created in line with the specifications discussed above, including instructor or subject matter expert (SME) review and field testing.

Online delivery of test forms for selected response items features quick scoring; constructed or performance response items can be handled via such platforms as Taskstream (Western Governors University is an institution that uses this website to handle student work). ePortfolios are another way to conceptualize constructed responses within CTE.

Assessment Delivery

Integrate operational components, interactions of students, and tasks: task- and test-level scoring, and reports for stakeholders ranging from test-taker to -user to -developer.

This layer refers to operation and maintenance of the system — together with monitoring of metrics designed to summarize system operation.

In CTE, this could include:

  • regular maintenance (conducted each testing cycle)
  • revisions of essential premises (converting from program to course basis)
  • user reactions (surveys, focus groups)
  • such metrics as tests delivered, breakdowns by month or time of day, or time to score a performance task (e.g., Western Governors University has an expectation of a three-day turnaround for assessors)

ECD Applications and Implications

A very general implication represented in research by Yarnall and Ostrander is the importance of three categories of knowledge/skills for learning and assessment: technical, social, and social-technical. While CTE advocates understand the importance of technical skills, the social and socio-technical skills are increasingly desired by employers and educators and must be incorporated in technical content standards, instruction, and testing. Social skills at work, for example, refer to a broad interpersonal category covering peer and client situations and patterns of interaction that can be addressed in domain analysis and modeling layers. Groups and other forms of distributed teamwork are typical modes of project performance. Socio-technical skills pertain to meta-cognitive control in scenario- and project-based learning. The work described by Yarnall and Ostrander triggers projects,using a series of emails from hypothetical individuals (prototypes are the client, the supervisor, or both). Such emails are building blocks that can provide project initiation requests, end-user requirements, and occasionally, an unexpected twist to a project (e.g., sudden shift in requirements, more or fewer resources to be deployed). Such constructs are also highlighted in a recent synthesis of 21st-century skills issued by the National Research Council (2012). Many are probably best assessed, using constructed or performance response items woven into project-based work, such as that described by Yarnall and Ostrander.

One application that could demonstrate the scope and power of ECD in a vital area of CTE is in the competitive design events held annually by career and technical student organizations (CTSOs). These events, whether hosted by Distributive Education Clubs of America (DECA), Health Occupations Students of America (HOSA), or SkillsUSA, begin locally and then proceed through state to national and sometimes international events in which individuals and teams compete. Domain analysis and modeling are completed, under the direction of the national or international body, in a systematic fashion that allows for repositories and updates over time. Then the results are applied to create the models in the conceptual assessment foundation: proficiency (student/person), task, and evidence. This layer translates into assessment implementation and consequentially operational usage. Each subsequent year provides an additional cycle for use and enhancement.

A second application lies in the area of articulation agreements between secondary and postsecondary governing bodies. The central objective is to develop credible procedures for granting college credit (usually three credits to six credits) to high school students based on their completion of rigorous coursework and appropriate assessments of knowledge/skill. Domain analysis in this context is collaborative and proceeds by laying out the content standards from the secondary side and considering various materials (i.e., syllabi, learning experiences, certifications) from the postsecondary side. The collaborative process identifies “learning outcomes” that are fed into the next layer of domain modeling. Processing of the learning outcomes involves creating assessment arguments and specifications for test item banks. Although the testing system is secondary-focused, the availability of learning outcomes before item writing allows for innovation. CETE assessment staff worked with the Ohio Department of Education (the state education agency) and the Ohio Board of Regents (the governing body for postsecondary education) to develop a system for incorporating postsecondary instructors and learning outcomes into the item bank development process. This involved display of the postsecondary learning outcomes in an item writing software tool as well as training instructors (high school and postsecondary) in possible levels of rigor for multiple choice items using Webb’s Depth of Knowledge (mostly levels 1 and 2). Some of the items feature workplace scenarios and various graphics to increase the rigor.

In conclusion, the ECD paradigm is an important way to conceptualize test creation and validation simultaneously rather than sequentially. CTE policymakers should realize that ECD is already present in a prominent and successful model: Cisco Networking Academies. Additional applications can be translated from the robust work that is ongoing in assessment consortia for the Race to the Top Initiative.


  • Behrens, J. T., Mislevy, R. J., DiCerbo, K. E., & Levy, R. (2010, December). Evidence-centered design for learning and assessment in the digital world. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Testing.
  • Bejar, I. (2010). Application of evidence centered assessment design to the Advanced Placement redesign: A graphic restatement. Applied Measurement in Education, 23, 378–391.
  • Haertel, G. D., Wentland, E., Yarnall, L., & Mislevy, R. J. (2012). Evidence-centered design in assessment development. Handbook on measurement, assessment, and evaluation in higher education, 257–276. New York, NY: Routledge.
  • Messick, S. (1992). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23, 13–23.
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
  • Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Testing.
  • Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.
  • National Research Council. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. Washington, DC: National Academies Press.
  • Tannenbaum, R. J., Robustelli, S. L., & Baron, P. A. (2008). Evidence-Centered Design: A lens through which the process of job analysis may be focused to guide the development of knowledge-based test content specifications. CLEAR Exam Review, 19(2), 26–35.
  • Williamson, D. M., Mislevy, R. J., & Almond, R. G. (2004). Evidence-centered design for certification and licensure. CLEAR Exam Review, 15(2), 14–18.
  • Yarnall, L., & Ostrander, J. (2012). The assessment of 21st century skills in community college: Career and technician education programs. Handbook on measurement, assessment, and evaluation in higher education, 277–295. New York, NY: Routledge.
  • Yarnall, L., Toyama, Y., Gong, B., Ayers, C., & Ostrander, J. (2007). Adapting scenario-based curriculum materials to community college technical courses. Community College Journal of Research and Practice, 31, 583–601.
James T. Austin, PhD