This page requires a taller screen.
Please rotate your device.

Internal versus External Validity in Rigorous Policy Impact Evaluations: Do We Have to Choose?

December 22, 2016
Governments like to know if policies designed to help less-advantaged citizens, such as low-skilled workers or families in blighted neighborhoods, are working. So do taxpayers. Much, but not all, of America’s social “safety net” assistance is guided and funded by federal agencies such as the U.S. Department of Health and Human Services, the U.S. Department of Housing and Urban Development, and the U.S. Department of Labor. Can researchers give policymakers the right information about what is working (and conversely, what is not effective) for the nation as a whole, especially when research is limited to select pockets of the country?

This is not as impossible as it sounds. If picked at random, states and counties that constitute five to 10 percent of the country can “tell the story” reliably for the nation as a whole (with little risk of sampling error), just as polling selected precincts can accurately anticipate national election results.

But there’s a catch. Doing research right in any given geographic area requires creating a study “control group” that does not receive the program being evaluated. The one agreed method for reliably measuring a program’s impact in, say, Dayton, Ohio or Tucson, Arizona is to use a lottery to randomly divide eligible applicants for assistance in those communities into two groups:  a “treatment group” that gets assistance and a control group that does not.  In such a design, just as in medical efficacy trials, if the two groups have different results it is virtually certain that the measured difference in outcomes is a consequence of the intervention.

Local Resistance to Rigorous Research 

This research approach often deters local government agencies from participating in program evaluations. In general, local program agencies don’t want to use a lottery to decide whom they serve and whom they do not. Program managers believe their services can benefit everyone, and don’t think research is needed to establish this point. Moreover, they —and local political leaders—have concerns about whether directing people’s lives for the sake of research is ethical, despite the prevalence of this practice in conducting clinical trials to improve health care. When resources permit serving all eligible applicants, authorities resist exclusions. In contrast, where limited budgets prevent serving all eligible applicants, program managers often believe that equity (serving the most needy), efficient targeting (serving those who will benefit most), or working from a waiting list (a first-come-first-serve approach) should decide who is helped—not a random selection process.

With some local agencies opting out of randomized studies, the agencies that do participate likely do not provide a balanced representation of the nation. Evidence about program effectiveness from the included sites may—and often will—mislead policymakers. In scientific terms this skewing produces policy evaluations suffering from “external validity bias,” the misalignment of findings in studied sites with the average effect of the program for the nation. 

Some large-scale studies have managed to avoid this shortcoming—such as the Head Start Impact Study, which persuaded all 84 local service agencies that were randomly-selected to represent the nation to participate in a randomized study. But the vast majority of studies have not.  Indeed, some evaluation experts believe that the way to gain balanced representation of the nation is to back away from random assignment designs and the “internally valid” evidence they produce at the local level. Or, put differently, some evaluation experts are willing to forego the use of rigorous research methods at the local level to produce nationally representative guidance for public policy decisions.

Can We Require Random Assignment?

Is a tradeoff between external validity for the nation and internal validity at the local level inevitable? How can we undertake social policy research that rigorously identifies successful government assistance approaches and lets society discard the rest? 

There may be a win-win option here. But it will not be easy. Federal agencies possess the authority to require participation in random assignment studies as a condition of funding social services nationwide. All local social service agencies everywhere that receive federal funding (or a rotating random subset thereof) could be required to participate in randomized trials. Researchers could randomly assign a small percentage of applicants to a research control group in each site—a group for which program entry is deferred but not permanently denied. Easier said than done, right?

Most challenging may be finding the political will to undertake such high-quality research. If we require some potential social assistance recipients to wait a bit longer for help (services whose benefits to participants have never been proven!) in a representative set of sites, the nation can use the one guaranteed way to help all Americans on the margins. We don’t do this well with our current research methods. Let’s get it right now, and from now on.