Measuring what matters: How do you measure systems change?

This blog was written by Shraddha Iyer and Srushti Joshi from the British Asian Trust. It shares learnings from the LiftEd Development Impact Bond, India’s largest Outcomes-Based Financing initiative that takes a systemic approach for improving foundational learning at scale.

Sometimes the most pertinent shifts in the education ecosystem are the hardest to see. A government official who starts asking questions about students’ learning levels in meetings, or a teacher who uses learning data from her classroom to consult her mentor on classroom strategies – these would not show up in a standard assessment. Despite being intangible or hard to measure, these may precisely be the changes that determine whether improvements in foundational learning take root and sustain themselves well beyond any time-bound programme. Therefore, if we are to measure systems strengthening, it is important for these systemic changes to be identified, measured and quantified.

Put this challenge in the context of an outcomes-based financing (OBF) instrument, with measurement and third-party verification at the heart of it, and the challenge may seem even bigger – how do you tie funding to outcomes that are in essence about change at the systemic level?

This is precisely the challenge the LiftEd Development Impact Bond (DIB) was grappling with in early days. The LiftEd DIB is a first-of-its-kind OBF initiative that takes a systemic approach to improving Foundational Literacy and Numeracy (FLN) outcomes for approximately 1.5 million children across five states in India. The question at hand was how do we design an evaluation framework to measure systems change that is technically defensible, programmatically grounded and commercially viable within an OBF structure? This blog outlines a few lessons that speak to this perplexing question.

Why does systems change matter for improving education outcomes?

There is a growing body of evidence suggesting that even well-designed, well-implemented education programmes struggle to sustain their gains once programmatic support is withdrawn. This is because the underlying institutional routines, incentive structures and capacities that shape day-to-day practice in the system remain largely unchanged. Not working to strengthen systems risks reverting to square one at the end of the intervention period.

The launch of the national FLN mission (National Initiative for Proficiency in Reading with Understanding and Numeracy, NIPUN, Bharat) by the Government of India in 2021, signalled a strong policy and funding push to achieve universal FLN by 2027, while COVID-19 disruptions highlighted the need for more resilient education systems. The LiftEd DIB was designed to build on these shifts and lessons from the Quality Education India impact bond about working intentionally with the government for unprecedent scale and long-lasting change. The programme moved beyond a traditional grant model and adopted an impact bond structure (an OBF instrument) to improve accountability by tying funding to verified outcomes rather than pre-defined activities. Under this model, an impact investor provides upfront capital to implementors and bears performance risk, with repayment contingent on outcomes and not rigid activities. By aligning partner incentives, enabling flexible and data-driven implementation, and ensuring that funding follows evidence of impact, the OBF model created the conditions to push the needle on FLN at scale.

Diagram depicting LiftEd’s intervention pathways. The DIB aims to build the capacity of system stakeholders across the government cascade to accelerate foundational learning and improve student learning outcomes. This is through three key pillars: data driven governance and decision-making; behaviour and capacity shift of middle managers; and improved FLN practices in classrooms. The diagram shows the range of stakeholders for each of these pillars.

Figure 1: The LiftEd DIB’s intervention pathways

The challenges of defining, quantifying and measuring systemic shifts

The focus on systems change presented an immediate and uncomfortable question: how do you measure it in the first place, and secondly, link payments to outcomes that are notoriously difficult to define, quantify and take a long time to show impact for several reasons.

Systemic change is non-linear, wherein inputs do not produce predictable outputs because multiple factors interact in ways that are hard to anticipate. Progress is often slow and invisible before it becomes visible, which poses a problem for evaluation. Attribution is genuinely complex, particularly in the Indian context, where national programmes, state government initiatives and multiple civil society actors are all operating simultaneously in the same geographies. Lastly, systems change is hard to measure because it involves multiple actors and outcomes at multiple levels all situated in shifting political, economic and social contexts.

The LiftEd DIB encountered similar challenges, along with the demands that come with an OBF initiative. Since the intervention is indirect and working with the system, the gestation period needed to see learning improvements is longer, which juxtaposes inconveniently against the payment cycles and durations that OBF instruments typically operate on. Also, measuring learning alone does not indicate shifts and improvements in behaviors, processes, etc. that constitute a system. Lastly, per-child improvements from systems-level work can be modest in the short term, even while they are meaningful at scale across a large mass of children over the long term.

Furthermore, selecting interim outcome indicators and setting targets for such indirect interventions was complex due to a lack of comparable studies, benchmarks and precedence.

It was clear that the DIB required a customised and standardised measurement and evaluation framework that could do three things: maintain technical fidelity to on-the-ground realities, remain credible to government stakeholders for uptake, and ensure commercial viability through annual repayments rather than end-of-programme outcome payments. In a systems-strengthening DIB operating across five states with multiple education partners, developing such a framework proved to be uniquely perplexing.

Constructing a measurement and evaluation framework for systems change

Given the paucity of comparable evaluations, precedents were scarce for how to measure outcomes for an OBF structure focused on systems change. LiftEd embarked on a ‘learning year’, which served as a laboratory for the programme prior to its launch to test assumptions, understand baseline conditions and understand what could effectively be measured, before fixing targets and payment structures. This learning year led to the development of a two-pronged evaluation framework which was fit-for-purpose.

Figure 2 shows the LiftEd DIB’s evaluation framework. There are two main elements: 1) Systemic shift indicators looking at whether interventions have brought about the required systemic shifts; and 2) Student learning outcomes focusing on whether interventions have ultimately resulted in improved learning outcomes. The hypothesis is built on three main elements: system, principal/teacher and student.

Figure 2: The LiftEd DIB’s evaluation framework

The first and more innovative components are the Systemic Shift Indicators (SSIs). Designed in collaboration with education partners working on the ground as part of the DIB, SSIs function as high-impact, low-cost levers across three levels of the government education system, and capture changes in how the system functions. By strengthening and monitoring these SSIs consistently, the resulting learning outcomes are more likely to last. Three indicators form the core of SSIs:

Data-driven governance and decision-making (measured via effectiveness of block-level meetings)
Behaviour and capacity shift of middle managers from administrators to educators (measured via quality of mentoring support)
Improved FLN practices in classrooms (measured via adoption of high-impact practices by teachers)

SSIs are verified by an independent third party using two methods: in-person classroom observations of government officials against selected parameters, and document reviews where observations are not feasible. The methodology was co-created with implementers and funders, and refined through multi-stakeholder consultations to balance rigour with operational realities, including the challenges of external verification in schools.

The second component is growth in Student Learning Outcomes (SLOs), which is the start for most education interventions. Measured via a cohort-based Difference-in-Difference design over three years, the SLO evaluation allows for systems-level changes to trickle down and plausibly show up in students’ learning data.

In terms of the payment structure, 60 percent of outcome payments are tied to SSIs across all programme years, while 40 percent are tied to SLOs in the final 2 years. This approach allowed for balancing commercial viability in the early years, while incentivising upstream systemic shifts and downstream learning outcomes.

Emerging results are promising. Though only halfway through the DIB, the SSIs are already leading to improved SLOs in a cost-effective manner at an unprecedented scale. Midline results from the evaluation point to 8.7 months of additional growth in oral reading fluency, and over 2 months of additional learning in both literacy and numeracy.

Learnings from the LiftEd DIB’s evaluation journey

Three years into implementation, a number of lessons have emerged which can be applied by anyone looking to measure systems change.

Stress test before you scale: There is a tendency in programme design to treat the period before implementation as incubatory and something that must be surpassed before beginning the ‘real’ work. However, the DIB’s learning year disproved this assumption and helped identify benchmarks to fine-tune what could actually be measured, and negotiate and co-create the evaluation framework with all stakeholders involved.
Balance rigour with operational feasibility: Methodological rigour and operational feasibility are not opposing considerations but interacting forces that must be actively balanced in evaluation design. Chasing gold-standard approaches like randomised controlled trials or fully counterfactual designs is often neither appropriate nor necessary. What matters is adopting a fit-for-purpose evaluation framework: one that is credible enough to anchor financial decisions and outcome verification, while remaining compatible with programmatic implementation realities.
Co-create and learn by doing: Taking flexibility and the need to pivot in one’s stride and treating it as a normal and necessary part of the process requires cultivating a ‘do-and-learn’ mindset. Furthermore, co-creation in such programmes matters more than one would expect. SSI indicators, rubric and verification approaches were co-created with partners to ensure strong alignment with their theories of change (ToCs) and on-the-ground realities.
Build trust-based governance structures: A complex first-of-its-kind multi-stakeholder initiative such as LiftEd with 26+ partners, first and foremost required trust to be built and a strong governance system has been critical to build this trust. Clearly defined roles, structured working groups, systematic and documented meetings have provided room for open and transparent communication among diverse stakeholders.

Evaluating systems change may seem challenging, yet the last three years of LiftEd demonstrate that it is possible, with the right approach. It requires close, honest collaboration with all stakeholders and a willingness to work within the realities of complex systems rather than impose external designs. What distinguishes meaningful measurement from theoretical frameworks is this grounding in practice. Measuring systems change may prove to be at times difficult, but that is not a reason to avoid trying and learning, in a constant loop.

Views expressed in outputs hosted on the UKFIET website are those of the contributors. They do not necessarily represent the views of UKFIET as an organisation, the UKFIET Trustees, Executive Committee or the wider UKFIET membership.