Why Randomize? A Primer on Experimental Evaluations

Although Abt engages in a wide variety of research and evaluation activities, what is central to my work in particular are evaluations that use an experimental design.  This blog post explains what experiments are and why we do them.  

To begin, people unfamiliar with the concept of policy experiments find experience from the medical field a useful introduction.  In order to test whether a new drug is effective in its claims, pharmaceutical companies undertake “randomized control trials” (or RCTs).  These trials randomly assign some people to—for example—take a new drug while others receive a placebo, an inert dose.  Because the two groups are assigned randomly, the two groups are essentially the same across all observable (e.g., age, gender, race and ethnicity) and unobservable (e.g., motivational levels) characteristics—except that one group is taking a new drug.  By following subjects’ subsequent outcomes, researchers can determine whether the drug made a difference (in reducing headaches or ulcers or cancer and so on).  
Substitute “public policy,” “social program,” or “intervention” of some sort for “drug” and the same approach applies in testing the effectiveness of social efforts to ameliorate all sorts of social and economic ills. And here’s how it works:
First, researchers use a lottery-like process to create two groups: a “treatment group” assigned to receive the program or policy that defines the intervention and a “control group” excluded from the program or policy for research purposes. 
The “control group” provides a powerful comparison group—or counterfactual—that tells us what would have happened in the absence of treatment.  Experimental evaluations allow us to rule out alternative rival explanations for why some intervention achieved its effects, eliminating maturation and historical, political, economic or social trends as reasons for a program’s observed impact.  These rival explanations are called “threats to internal validity” and typically include the following:

  • Historical forces (political, social and economic):  people find jobs more quickly in a strong economy than in a weak one; people feel more patriotic around election day; 
  • Selection bias:  people most likely to succeed in and benefit from a program are those who enroll;
  • Maturation:  people learn and grow over time, in ways that affect their outcomes;
  • Regression artifacts:  people who enroll in programs to help them are there because they are at a low point in their lives and will get better (“regress” to the mean) even without a program’s help; and
  • Interactions among the above.

Consider a tutoring program for second grade children who are reading at the first grade level.  Parents can enroll their children in tutoring in order to bring their reading skills up to grade level. A pre-post evaluation showed that the program increased reading at grade level 5,000 percent!:  at the beginning of the school year one percent of the children in the program were reading at grade level, and by the end of the year 50 percent were.  Three main alternative explanations exist:  

  • Selection bias:  the parents who chose to enroll their children in tutoring were probably more invested in helping their children outside of the program as well, contributing to their children’s improvement in reading.
  • Maturation:  young children are like sponges.  They learn how to read, in part, through their own maturation processes, aided by other stimuli in their environments.
  • Regression artifacts:  children reading below grade level have just one way to go:  up.  Moving toward the mean is another reason their reading scores might improve.

If—instead of asking parents to enroll extremely poorly performing children into an intervention—an evaluation would have randomized below-grade-level students into a tutoring program, then those with access to the tutoring would have the same characteristics as those in the control group such that examining their later impacts would net out the influences of selection bias, maturation and regression-to-the-mean. In turn, the evaluation would be able to tell a causal story about the effects of tutoring rather than being unsure what portion of that 5,000 percent effect might be attributable to the program.
The random assignment process permits researchers to estimate unbiased impacts of programs and policies, providing causal conclusions regarding the effects of a program or policy.  While other kinds of research and evaluation are relevant to answering other kinds of questions, when it comes to the question of the cause-and-effect of a program’s impact, a randomized experiment should be the evaluation design of choice.

Posts and comments are solely the opinion of the author and not that of Abt Associates.

Blog post currently doesn't have any comments.
Leave comment

 Security code