This page is optimized for a taller screen. Please rotate your device or increase the size of your browser window.

Why Randomize? A Primer on Experimental Evaluations

June 28, 2021

Although Abt engages in a wide variety of research and evaluation activities, evaluations that use an experimental design are central to my work in particular. 

This blog post explains what experiments are and why we do them. To begin, people unfamiliar with the concept of policy experiments find experience from the medical field a useful introduction. Indeed, the COVID-19 pandemic has made the concept familiar to the public as we’ve watched the development and testing of vaccines. In order to test whether those vaccines are effective, for example, researchers use “randomized control trials” (or RCTs). These trials randomly assign some people to take a vaccine while others receive a placebo, an inert dose. Because the two groups are assigned randomly, they are essentially the same across all observable (e.g., age, gender, race and ethnicity) and unobservable (e.g., motivational levels) characteristics—except that one group got the vaccine. By following subjects’ subsequent outcomes, researchers can determine whether the drug made a difference. 

Substitute “public policy,” “social program,” or “intervention” of some sort for “vaccine” and the same approach applies in testing the effectiveness of social efforts to ameliorate all sorts of social and economic ills. And here’s how it works:

First, researchers use a lottery-like process to create two groups: a “treatment group” assigned to receive the program or policy that defines the intervention and a “control group” excluded from the program or policy for research purposes.

The “control group” provides a powerful comparison group—or counterfactual—that tells us what would have happened in the absence of treatment. Experimental evaluations allow us to rule out alternative rival explanations for why some intervention achieved its effects, eliminating maturation and historical, political, economic or social trends as reasons for a program’s observed impact. These rival explanations are called “threats to internal validity” and typically include the following:

  • Historical forces (political, social and economic): people find jobs more quickly in a strong economy than in a weak one; people feel more patriotic around election day;
  • Selection bias: people most likely to succeed in and benefit from a program are those who enroll;
  • Maturation: people learn and grow over time, in ways that affect their outcomes;
  • Regression artifacts:  people who enroll in programs to help them are there because they are at a low point in their lives and will get better (“regress” to the mean) even without a program’s help; and
  • Interactions among the above.

Consider a tutoring program for second grade children who are reading at the first grade level. Parents can enroll their children in tutoring in order to bring their reading skills up to grade level. A pre-post evaluation showed that the program increased reading at grade level 5,000 percent!: at the beginning of the school year one percent of the children in the program were reading at grade level, and by the end of the year 50 percent were. Three main alternative explanations exist: 

  • Selection bias: the parents who chose to enroll their children in tutoring were probably more invested in helping their children outside of the program as well, contributing to their children’s improvement in reading.
  • Maturation: young children are like sponges. They learn how to read, in part, through their own maturation processes, aided by other stimuli in their environments.
  • Regression artifacts: children reading below grade level have just one way to go: up. Moving toward the mean is another reason their reading scores might improve.

If—instead of asking parents to enroll extremely poorly performing children into an intervention—an evaluation would have randomized below-grade-level students into a tutoring program, then those with access to the tutoring would have the same characteristics as those in the control group such that examining their later impacts would net out the influences of selection bias, maturation and regression-to-the-mean. In turn, the evaluation would be able to tell a causal story about the effects of tutoring rather than being unsure what portion of that 5,000 percent effect might be attributable to the program.

The random assignment process permits researchers to estimate unbiased impacts of programs and policies, providing causal conclusions regarding the effects of a program or policy. While other kinds of research and evaluation are relevant to answering other kinds of questions, when it comes to the question of the cause-and-effect of a program’s impact, a randomized experiment should be the evaluation design of choice.

logo Subscribe to our bimonthly newsletters, with information about our work, staff, and current job openings, and other periodic mailings.

Work With Us
Ready to change people's lives? We want to hear from you.
We do more than solve the challenges our clients have today. We collaborate to solve the challenges of tomorrow.