 |
Assessment methodology
This is a brief explanation of some of the technical issues referred to in:
Wiliam, D. (1992). Some technical issues in assessment: A user's guide. British Journal for Curriculum and Assessment 2(3), 11-20.
Reliability
Traditional methods of attaining reliable assessment of students are:
- test-retest reliability (if the same student was assessed twice on the same test, would the marks be the same?)
- mark-remark reliability (if the same student was assessed by two different teachers, would their marks agree?)
- parallel forms reliability and split-half reliability (if a different, but similar, test were used, would the student get the same marks?).
These methods of reliability are generally statistic-based. Wiliam shows that since the reliability of a test is only a small part of its dependability, these traditional techniques of estimating reliability are not relevant to the typical use made of tests in schools.
Instead of examining reliability from a statistical point of view, Wiliam believes it is more relevant (although also more fallible) to focus on:
- The processes involved in getting information about what students have attained (disclosure).
- What can go wrong (fidelity).
Dependability
For an assessment to be dependable, it must be accurate, it must be reliable, the assessor must be confident that the results disclose the student's attainment in the area being assessed, and sure that the process of recording the assessment result is faithful.
Validity
The traditional definition of validity is that an assessment measures what it purports to measure. This has led to many different ways of defining validity (face, content, descriptive, predictive, concurrent, criterion-related, intrinsic, convergent, discriminant, curricular, instructional, construct, and backwash validity). Each emphasises a different facet of validity.
Wiliam argues that a global definition of validity, one that encompasses all the above facets, is that a test is valid (as far as the assessor is concerned) to the extent that the assessor is happy for a teacher to teach towards the test.
Back to top
Norm, criterion, and other referencing
Any assessment scheme works by comparing the performance of an individual with something else – a referent. The three in common use are:
-
Norm-referenced assessment, where individuals are compared to the norm of a group. This does not tell what an individual can or cannot do, only that they can do something better or less well than others. This means that the individual's final outcome depends on the performance of the whole group.
-
Criterion-referenced assessment relates a student's performance to a well-defined objective, though this can depend on the interpretation of that objective.
-
Ipsative assessment is one that relates to the individual's previous performance. This is the purest form of assessment and involves only the teacher and student.
-
Construct-referenced assessment is a term used by Wiliam to describe the type of assessment that takes place entirely through the medium of coursework, marked by the teachers at the school. This kind of assessment is based on an idea or construct, and can be problematic as it involves assessing complex skills that are often based on creative concepts.
Combination, aggregation, and reconciliation
Combination and aggregation are terms used to describe any process where assessment scores are collected in order to produce a single score. This process has problems if the assessor is not aware of any weight differences in the individual scoring.
Reconciliation is used to describe the process where there are two or more assessments that purport to be of the same thing, such as teacher assessment and scores from national examinations.
Back to top
Standardisation and moderation
These terms are usually used interchangeable, to describe the process that ensures multi-assessments are in some way comparable. Wiliam proposes that the term moderation be used for a quality control approach to comparability (where the students are assessed by teachers and the teachers are assessed by a moderator).
In contrast, he proposes that standardisation describes a quality assurance approach to comparability. Rather than changing the marks awarded to students, the effort is directed to aligning the standards that the teachers use so that they produce comparable assessment (the assessments are standardised).
The quality assurance approach, through standardisation, is to be preferred over the quality control approach, according to Wiliam. This is because standardisation is "forward-looking": if effort is spent aligning teachers' standards, the amount of time that needs to be spend in future years should get less and less. Also, it overcomes the problem in the moderation approach, which arises from the need for the moderator to have the same evidence as the original assessor. This is possible where the assessment is a written test, but not possible when trying to assess as performance which leaves no permanent evidence.
Wiliam's summary
The single most important idea is the recognition that all assessment takes place in a social context, done by, on, and for real people, and that any attempt to understand the effects of the assessment without addressing the social consequences of the assessment is bound to fail.
There are no "off-the-peg" methodologies, no step-by-step procedures that guarantee valid and dependable assessment. Instead, there is a continuing tension between the assessment procedures, and the use that is made of the information they yield.
Back to Gathering information | Back to top
|
 |