Evidence Snacks
Posts
Accuracy & reliability

Accuracy & reliability

Maximising assessment validity

Hey 👋

Wassup. This week, we’re extending our assessment theory series with a quick look at accuracy and reliability…

Big idea 🍉

Validity refers to the extent that any inferences we draw from an assessment are a true reflection of reality. If I weigh 70kg and my scales always show 70kg, then we might say that they are valid.

Reliability is one component of validity. It refers to the ability of a measure to produce a similar result under similar conditions. If my scales showed that I was 70kg in the bathroom but 75kg in the kitchen, then they wouldn’t be very reliable. And as a result, the inferences we could draw from them wouldn’t be very valid either.

Reliability contributes to validity. However, a reliability by itself is insufficient. Our weighing scales could be consistent, but they might not be properly calibrated. Despite being 70kg, they might always show me as weighing 75kg (regardless of the room I use). Validity requires accuracy as well as reliability.

There are various things that influence the reliability of school assessments:

The questions we use across different assessments which try to measure the same thing.
The conditions in which the assessments take place.
The consistency of marking, between different people or even by the same person at different times.

For greatest reliability (and so validity), we want to get the same result regardless of the questions that were used, the time or place the assessment was conducted, or the person who marked it.

Note 1 → Different subjects and question types lend themselves better to more reliable assessment. For example, math(s) and multiple-choice questions tend to have more definitive answers than literature and essay questions, which increases the chances that multiple markers will award similar results.

Note 2 → There are often trade-offs between accuracy and reliability. For example, we could increase the reliability of a history assessment by using only multiple-choice questions, but in doing this we would reduce the accuracy (and so overall validity) of the inferences we could make as a result.

🎓 For more, check out this article on validity, reliability, and all that jazz by Dylan Wiliam.

Summary

Reliability refers to the ability of a measure to produce a similar result under similar conditions.
Reliability is a component of validity, along with accuracy.
We should consider trade-offs between accuracy and reliability when seeking to maximise validity.

Little updates 🥕

Study examining the impact of a four-day school week in Arkansas → finds it can improve teacher retention by reducing job transfers but has inconclusive effects on teacher quality.
Trial evaluating the effects of board games in primary education → suggests that games might improve executive function and academic skills more than explicit teaching (Important: this finding conflicts with well established evidence around the value of explicit teaching, so please don’t make any drastic changes based on this one study).
Paper examining the role of habit in real-world behaviour change → provides a useful update on the state of our understanding around habits & bonus: summary thread by the lead author.
Evidence-Based Education are running a free webinar on the new EEF Implementation Guide with lead author Jonathan Sharples.

For double the links and more, sign up to Snacks PRO → join here

Boo-ya.

Peps 👊