- Evidence Snacks
- Posts
- Accuracy & reliability
Accuracy & reliability
Maximising assessment validity
Hey ๐
Wassup. This week, weโre extending our assessment theory series with a quick look at accuracy and reliabilityโฆ
Big idea ๐
Validity refers to the extent that any inferences we draw from an assessment are a true reflection of reality. If I weigh 70kg and my scales always show 70kg, then we might say that they are valid.
Reliability is one component of validity. It refers to the ability of a measure to produce a similar result under similar conditions. If my scales showed that I was 70kg in the bathroom but 75kg in the kitchen, then they wouldnโt be very reliable. And as a result, the inferences we could draw from them wouldnโt be very valid either.
Reliability contributes to validity. However, a reliability by itself is insufficient. Our weighing scales could be consistent, but they might not be properly calibrated. Despite being 70kg, they might always show me as weighing 75kg (regardless of the room I use). Validity requires accuracy as well as reliability.
There are various things that influence the reliability of school assessments:
The questions we use across different assessments which try to measure the same thing.
The conditions in which the assessments take place.
The consistency of marking, between different people or even by the same person at different times.
For greatest reliability (and so validity), we want to get the same result regardless of the questions that were used, the time or place the assessment was conducted, or the person who marked it.
Note 1 โ Different subjects and question types lend themselves better to more reliable assessment. For example, math(s) and multiple-choice questions tend to have more definitive answers than literature and essay questions, which increases the chances that multiple markers will award similar results.
Note 2 โ There are often trade-offs between accuracy and reliability. For example, we could increase the reliability of a history assessment by using only multiple-choice questions, but in doing this we would reduce the accuracy (and so overall validity) of the inferences we could make as a result.
๐ For more, check out this article on validity, reliability, and all that jazz by Dylan Wiliam.
Summary
Reliability refers to the ability of a measure to produce a similar result under similar conditions.
Reliability is a component of validity, along with accuracy.
We should consider trade-offs between accuracy and reliability when seeking to maximise validity.
Little updates ๐ฅ
Study examining the impact of a four-day school week in Arkansas โ finds it can improve teacher retention by reducing job transfers but has inconclusive effects on teacher quality.
Trial evaluating the effects of board games in primary education โ suggests that games might improve executive function and academic skills more than explicit teaching (Important: this finding conflicts with well established evidence around the value of explicit teaching, so please donโt make any drastic changes based on this one study).
Paper examining the role of habit in real-world behaviour change โ provides a useful update on the state of our understanding around habits & bonus: summary thread by the lead author.
Evidence-Based Education are running a free webinar on the new EEF Implementation Guide with lead author Jonathan Sharples.
For double the links and more, sign up to Snacks PRO โ join here
Boo-ya.
Peps ๐