In medicine, doctors are striving to constantly improve the outcomes for patients through finding new treatments, drugs and combinations of therapies to find what works and for who. When new treatments are designed or invented, they undergo a series of trials before the interventions even reach the general public. Imagine if a drug company designed a new drug, released it onto the market (unlicensed) for children to use and these children became ill or died as a result of the drug… this does not happen for a reason.
Unlike healthcare, all children are compelled to have education. Government policy, new teaching trends or technology interventions are unleashed on millions of children yet very few people stop and ask the questions, “are these effective?” and “what evidence do we have to decide if these interventions are effective?”.
In my past experience as a teacher, before starting an academic career in research, I admit I had very little understanding of evidence. I have numerous examples of interventions deployed in my classrooms, either initiated at government level or through school improvement plans. A few notable examples include learning style questionnaires to design VAK (visual, auditory and kinaesthetic) tasks for individual lesson planning, brain gym activities, national teaching frameworks and booster materials, whole school deployment of interactive whiteboards and more recently class purchases of tablets. As a teacher I often trialled new strategies, such as using webinars for revision outside of school hours and flipping the classroom by creating online video tutorials. In a few instances, I used a pre and post assessment to measure the impact of the interventions in my classroom but evaluations of interventions in schools I worked in were rare.
In education, we seem to have good intentions and if we think something will work, we deploy it whole school for all children. Yet the notion of good intentions would not be used in medicine, so why is it used in education?
Fitz-Gibbon (2004) explains two examples where good intentions actually harmed children. Scared Straight was a programme based on the theory that if children showing signs of delinquency were taken to prisons to be told by inmates how horrible prison was, they would be deterred from a life of crime. Many subjective responses such as “it kept me away from crime” were given regarding the effectiveness of the programme, but through the use of a control group left untreated (randomised) it was found that the programme led to more serious crimes and increased recidivism in the treatment group compared to the control (McCord, 1978, 2001). A second example explains how the good intentions of providing counsellors to provide psychological debriefing immediately after a trauma and accidents. This seems a logical process to support patients’ wellbeing. However, two randomised trails concluded that this will do more harm (Mayou et al, 2000; Wessely et al, 1998).
In medicine, numerous examples exist where good intentions led to disastrous consequences. In the 1940’s and 50’s, premature babies were given pure oxygen (again, good intentions) yet during this time it was noted that there was an ‘epidemic’ of blindness among premature infants. It was also common for premature infants to be given prophylactic antibiotics, with an increase in brain damage and death recorded at this time. It was only through the use of randomised controlled trials that these consequences were identified and medical practice stopped delivering these interventions.
More recently, systematic reviews and meta-analyses (combining results of trials) have identified ineffective and harmful medical interventions, leading to doctors using evidence to inform prescribing of medications for patients.
Yet, in education many seem to be happy to stay with the concept of ‘good intentions’ and even if evidence is used to find out the effectiveness of an intervention, often inferior ways to determine the effectiveness are used.
What do I mean by inferior ways to determine effectiveness?
As I mentioned earlier, as a teacher I tried to evaluate a few interventions using a pre-test, intervention and post-test evaluation with any improvements attributed to the intervention. This design is commonly used in education, along with case studies with subjective responses from the participants of the intervention. However, after starting an MA in research methods prior to my PhD, I soon became aware of the serious limitations this design has IF the research questions were focused on effectiveness. I will briefly summarise a few key issues but this will be covered in greater detail in the next blog.
Let’s look at an example using technology, where a technology company explains that their new app increases pupil’s attainment by 30% for mathematics. The company used a pre-test, implemented the application on a class set of tablets and gave a post-test at the end of the term. The mean result for the class improved by 30% so this provides evidence of the impact of the intervention. The weakness of single group designs (i.e. no comparator group is used) mean that alternative explanations can be provided for the increase in scores. Here are a few:
- Temporal change – this just means that self-learning will occur irrespective of the intervention. Any intervention or treatment mixed up with these temporal changes are difficult to disentangle if no comparator group (control) is used.
- Regression to the mean – with extreme scores likely to change in the post-test regardless of the intervention.
- Instrumentation threats – if the same assessment is used pre and post-test then familiarity with the test can result in an increase in the score.
- History threats – during the time the application was implemented, the head of maths provided additional support to this class as it as a key examination class (extra in class support and homework booklet).
- Mortality / Attrition– a number of students dropped out or missed the assessment so they were not included in the analysis. If students who struggled with maths missed the assessments, this could bias the results.
- Sample size – the sample size included only one class of 30 pupils, this is insufficiently powered to make generalisations.
I could go on but I hope you can see that the claim of 30% improvement in attainment could now have a number of alternative explanations for the increase in test scores. In education, the counterfactual is knowing what would have happened if the intervention was not delivered and the only way we can do this to determine effectiveness is through the use of trials, specifically randomised trials.
The following series of blogs will be written as part of my role as a Microsoft Innovative Educator Expert, designed to help teachers and educators become more critical with evidence and help support teachers design trials for interventions that they run. As part of my research at Durham University, I am using randomised controlled trials to determine the effectiveness of online cross-age peer tuition with group sizes 1:1, 1:2 and 1:4. I hope that through these blogs and webcasts (I am happy to deliver online training on various aspects of research methods in education via Microsoft educator network), I can help teachers ask the questions I failed to ask when I was classroom based.
My hope for 2016 is that education starts to move towards an evidence based practice, with teachers able to understand the strengths and limitations of different designs and question why trials are not used if the research question is regarding WHAT or WHETHER something works.
ESRC Funded PhD Student
Fitz-Gibbon, C.T. (2004). The Need for Randomised Trials in Social Research. Journal of the Royal Statistical Society. Vol 167, 1, 1-4.
Mayou, R. A., Ehlers, A. and Hobbs, M. (2000) Psychological debriefing for road traffic accident victims: three year follow-up of a randomised controlled trial. Br. J. Psychiatr., 176, 589–593.
McCord, J. (1978) A thirty-year follow-up of treatment effects. Am. Psychol., 33, 284–289.
McCord, J. (2001) Crime prevention: a cautionary tale. In Proc. 3rd Evidence-based Policies and Indicator Systems Conf., Durham. Durham: University of Durham.
Wessely, S., Rose, S. and Bisson, J. (1998) A systematic review of brief psychological interventions (“debriefing”) for the treatment of immediate trauma related symptoms and the prevention of post traumatic stress disorder. In Cochrane Library, vol. 4. Oxford: Update Software.