This blog was written by Dr Ricardo Sabates from the Faculty of Education and member of the Research for Equitable Access and Learning (REAL) Centre, University of Cambridge. Ricardo is also co-convenor of one of six 2019 UKFIET conference themes, ‘Future directions in inclusive education systems’.

Payment for performance is increasingly a popular proposal aimed at improving the efficiency of education service delivery. This is commonly based on experimental or quasi-experimental research. Whether these approaches are likely to be effectively adopted and implemented need to be interrogated at the level of the education system, a point which is central to the 2019 UKFIET conference on Inclusive Education Systems: Futures, Fallacies and Finance. In order to start this debate, I would like to put forward four arguments on the potential limitations of a recently-published randomised control trial (RCT) (by Clare Leaver et al.) which examines whether payment for performance in Rwanda could “change the composition of the teaching profession” and “induce higher effort” in order to achieve higher test scores. I examine if the evidence is sufficient to inform system level changes.

Perhaps the most cited argument against RCTs being able to inform systemic change, is (1) the lack of external validity of small scale trials. Whether results obtained from the RCT could hold when implemented in different areas, under different conditions or by different implementing partners (or by the government itself) has been a matter of concern among numerous researchers. There are plenty of examples indicating that what works under RCT conditions, does not necessarily work when implemented by other agencies or when replicated in other places.

Beyond the lack of external validity, (2) small-scale RCTs tend not to be embedded within the system, and hence ignore the way in which governments operate, as well as their priorities. For instance, the paper by Leaver and colleagues concludes: “We note that the Government of Rwanda has recently decided to require standardized assessments at the end of each primary grade level”. Based on this assertion, the authors suggest that the RCT could be scaled up. What the authors do not take into account is the fact that the Government of Rwanda has other priorities as well, for instance, the use of formative assessments or the implementation of competence-based curricula. It is unknown whether payment for performance at the system level is a coherent policy instrument across all other government priorities, and results from the RCT alone are blind to this kind of policy recommendation.

Another important argument also heavily cited in current educational literature, is (3) the ability of the researchers to understand the potential causal pathways leading to results under RCTs. For example, Glewwe and Muralidharan (2015) reported on four RCTs, and the impact of providing books and materials to students. They found zero effects on learning outcomes and had four different explanations for this result. In-depth knowledge of the causal pathways requires mixed methods research designs. On the positive side, the research by Leaver and colleagues utilises a variety of quantitative measures and lab-in-the-field instruments to capture different aspects of the causal pathway, which may be induced by the incentive of the payment for performance. On the negative side, there is no in-depth qualitative work accompanying the design, and no explanation for what these field instruments may mean for the Rwandan context. Decolonising methodologies can only enhance a deeper understanding of the context, thus maximising the potential that policies are planned and understood, as well as delivered and implemented by those within the system.

The next argument is, in my view, at the heart of the problem of payment for performance at the system level: (4) can measures of value-added in student test scores capture performance in a fair and unbiased way? Can the system generate the value-added examinations which are comparable across grades and languages, as well as subjects, in a way in which teachers feel that the system is treating them fairly? Can teachers be also rewarded by measures of value-added which capture other important qualities of teaching, such as promoting citizenship, social skills, innovations, and social cohesion among students? How does the system deal with part-time teachers, teachers who move into the profession during the academic year, or the reallocation of teachers to support failing schools? Are there any negative consequences arising out of payment for performance, such as teaching to the test, or withholding low performers from taking the test? Even this well designed RCT is unable to answer many of the questions above, which are important for system change.

In a recent blog, based on the same RCT, Markus Goldstein, a leading economist at the World Bank, raised the question: Would it be a good idea to introduce payment for performance as a new form of contracts for teachers in Rwanda? I hope not, as there is no evidence from the research presented that this may be the case at the system level. In fact, I argue that the RCT lacks external validity – payment for performance reduces teaching to a measurable outcome, and is potentially inconsistent with other key priorities of the Government of Rwanda. This is very much a matter of debate, so I invite you to engage with us, and with the topics, so that we may provide future directions to systems research in education during the UKFIET Conference.