This blog was written by Paulina Valenzuela, Oxford MeasurEd.

How do you stand out at a conference when everyone there is doing interesting work and some of them are also offering freebies? This question kept our team busy on the weeks leading up to the September 2023 UKFIET conference. We considered a couple of options, but there was not enough time to learn how to juggle and – (un)fortunately –  none of us can sing. We decided on the next best thing, and something which we are actually good at: quizzes.

The ‘contest of the century’

Many of our projects involve assessing students on skills ranging from numeracy to creativity. We thought to ourselves: why not use that expertise to test UKFIET attendants? As philosophy suggests and Buzzfeed’s success shows, people are always eager to learn more about themselves. So we adapted some of the assessments we have developed in collaboration with partners as part of our work with Education Cannot Wait (in collaboration with Cambridge Education) and Schools2030. We adapted the tools we developed with partners across participating countries[i], to make them more age and context appropriate in order to figure out who had better SEL skills: students, or experts in education for international development. The ‘contest of the century’ was getting more and more real.

When developing the idea, one of our worries was that the tests would be ‘too easy’ for UKFIET conference attendees. After all, most of the items we used were originally meant for 10-year-olds. What if everyone got every single item right? Turns out we didn’t need to worry!

At the end of the conference, there was a clear winner. As you can see on the chart below, students beat UKFIET attendees in four out of six tests: emotional management, problem-solving, empathy and relationship building. We did get some wins though, performing better in leadership and respect for the environment, so there are things to be proud of!

Graph showing responses from UKFIET conference attendees in relation to responses from students in Burkina Faso, Tanzania, Kenya, Portugal, India and Uganda. UKFIET attendees scored lower that students in four out of six tests: emotional management, problem-solving, empathy and relationship building. However, UKFIET attendees did score higher in leadership and respect for the environment.

Caveats caveats caveats…

As tempting as it might be to do otherwise, it is best to view these results as a fun exploration rather than something fully scientific. A lot more would have to happen for the tools we adapted and administered at UKFIET to be valid and reliable. There are at least four reasons for this:

  1. While the tests we used at the UKFIET conference were multiple choice, most of the tools they were based on are open ended and administered in a one-to-one scenario, with an enumerator that listens to a child’s response and scores it according to a series of criteria. This allows for a more nuanced response to a situation than picking one of four possible responses.
  2. We did not study the psychometric properties of the items we adapted for UKFIET as we would normally do. That is, we didn’t select the best performing items; check whether they were all measuring the same thing; score results taking into consideration the relative difficulties of each item; or check for biases on account of variables such as gender or language.
  3. Assessments, especially SEL assessments, must consider the context in which they will be applied. Empathy does not manifest the same way everywhere; for instance, ignoring someone’s tears might be considered cold-hearted in one country but respectful in another. The one-size-fits-all approach that we took is unlikely to measure people fairly in such a diverse environment as UKFIET.
  4. Participants answered the questions on their phone within a busy environment full of distractions. It’s very likely that completing the assessment on a phone while catching up with colleagues on the way to those fantastic cookies is not an ideal atmosphere for being assessed.

Complexity complexity complexity…

The caveats outlined above are what is most interesting about our exercise, as they are precisely what makes quality assessments so important, and so complex.

Deciding what to measure reveals what we are aiming for, and thus what our priorities are. A learner’s empathy is certainly harder to measure than their capacity to add or multiply, but if they are only graded for math, we are inadvertently saying that is what students and teachers should focus on. Hence it is worth figuring out ways to capture their empathy too, even if it is complex.

The big trick in measuring these traits is that our tools, by design, need to be short enough that they do not overwhelm those being tested and they can be feasibly applied on a large number of students. This means reducing phenomena down to simpler, quantifiable constructs in order to measure human attributes for comparative purposes. However, these remain samples of behaviours representing complex, unobservable traits. Good quality tools will manage to capture examples that are representative enough of a trait that they allow us to say something meaningful about it, but these limitations should inform how we interpret results.

Added difficulty comes from the fact that the more complex a trait is, the more context dependent it will be. It is likely that even between two people there will be different definitions for what it means to be an empathetic person, a good leader, or an effective relationship builder. By the same logic, we expect to find relevant differences in the ways different cultures approach these traits. Qualitative methods are needed to probe the kinds of behaviours that are representative of the construct in particular contexts and cultures within the social ecologies of children’s lives. This includes behaviours in relation to their peers and within their families and communities. Instrumentation (cognitive and social and emotional) that does not reflect the experiences of children in the locale will be unlikely to capture the intended attribute in a meaningful manner.

This means that whether developing or adapting existing tools, measurement precision is fundamentally linked to context and therefore, item and tool development (or adaptation) must be context driven. This is why the process of developing the tools we applied in Burkina Faso was different from the process in Portugal or India. While many technical aspects of our work can be standardised, the context that makes tools relevant will always vary. Conversations with stakeholders, revisions of relevant material instructional material in the country and openness to feedback from national teams is just as important as the most sophisticated psychometric calculations.

As we approach the 2030 deadline for the Sustainable Development Goals, the need for readiliy available information on learning outcomes becomes more pressing. There might be a temptation to trade off complexity in favour of comparability, aiming for global socio-emotional assessment tools that can be applied everywhere with little to no adaptations. We should ignore it though: by sacrificing complexity we would be sacrificing relevance too. Quality information is paramount in the challenge of improving education, but its first requirement is for it to represent and make sense to the communities that are being assessed. One-size-fits-all approaches will most likely end up fitting no one.


[i] We are very grateful to our partners involved in the projects where we worked collaboratively to develop assessments we adapted, and the students who took part in the original data collection. This includes the General Directorate of Planning and Statistics (DGESS), Burkina Faso, Cambridge Education and Education Cannot Wait for emotional understanding and management in Burkina Faso (5th grade students from host and displaced populations); Aga Khan Foundation and Emily Tusiime for leadership in Kenya (10-year-old-students), problem-solving in Tanzania (10-year-old students) and relationship building in Uganda (out-of-school learners over 15); Aga Khan Foundation and University of Porto for empathy in Portugal (10-year-old students) and Aga Khan Foundation and Eklavya for respect for the environment in India (10-year-old students).