This page is optimized for a taller screen. Please rotate your device or increase the size of your browser window.

When Is Randomization Right for Evaluation?

August 13, 2021

Public policy researchers grapple with choosing appropriate research designs to guide their work as they must address both the underlying policy questions and numerous practical realities.

I advocate using randomized experiments whenever possible because they provide a high level of confidence in the results. But randomized experiments are not always possible and sometimes are inappropriate. This raises a provocative question: what criteria should researchers use to decide when to use an experimental design and when to look for alternatives?

When to Experiment?

A randomized experiment is appropriate under the following conditions:

1.  When getting an unbiased, causal estimate of the policy/program impact matters. An experiment produces an unbiased, causal estimate of program impacts, and when the research or policy question requires such information, then an experimental evaluation design is the preferred one.

2.  When randomization is feasible, legal, and ethical. The process of randomizing cases to treatment and control groups must be practically feasible. And crafting a control group—that is embargoed from program services for some period of time—must be both legal and ethical. For example, researchers cannot exclude eligible applicants to an entitlement program from receiving that entitlement, but we could use an experiment to assess whether issuing a higher level of entitlement payment is more effective in achieving certain goals.

3.  When the evaluation is prospective. Usually, an experiment can only be implemented with advance planning, integrating randomization into the program intake process, and allowing treatment and control group cases to move forward under their experimental conditions.

4.  When the resulting information will be timely enough to make a difference. Because of the challenges in successfully implementing an experiment, the value produced by its results—for policy and program decisions—must be warranted, both in terms of their content and their timing.

5.  When the study population is a reasonable proxy for the population of interest. A common critique of experiments is that their findings are limited to the population under study. If a broad policy decision will be made based on the evaluation’s results, then the study population must not be so idiosyncratic as to prevent generalization and should be configured to support generalization.

6.  When the cost of the evaluation is commensurate with the value of the information produced. To the extent it can be gauged, the value of an evaluation’s results must also exceed the cost of carrying out the research, a criterion that exists regardless of the selected evaluation design.

When Not to Experiment?

It may be the case that all the criteria above are met, but that it still might not be appropriate to use an experimental evaluation design. An evaluator should not use a randomized experiment in the following circumstances:

1.  When effects are so large that causality is obvious. A central tenant of social science research is that correlation is not causation. That said, some correlations are so large that they can imply causality, such as the relationship between tobacco smoking and cancer. Randomizing who smokes (in addition to being impractical and unethical) is not necessary to establish a causal claim.

2.  When an alternative evaluation design could provide a highly reliable answer and at a lower cost. Sometimes, conditions are such that non-experimental evaluations are the right choice for an impact evaluation design. In those cases, if such an evaluation costs less than an experimental evaluation, then it would be appropriate to use it.

3.  When a program is not yet ready for an impact evaluation. Measuring the impacts of newly crafted, not-yet-fully-developed programs is premature. Impact evaluations, including experiments, are best used once programs reach a steady state of implementation.

4.  When evaluation questions are not about impact. Perhaps most obviously, an experimental evaluation cannot answer non-impact questions. Other evaluation designs and methods are more fitting for questions regarding program processes and operations or program outputs or outcomes.

These are a few guideposts for researchers to consider when deciding whether an experimental design is the right fit. Recent advances in the use of experimental evaluation designs for program improvement indicate their flexibility and amenability to various circumstances for policy learning.  Beyond this blog post, the American Evaluation Association’s Evaluation Policy Task Force is another useful resource on this topic.

Work With Us
Ready to change people's lives? We want to hear from you.
We do more than solve the challenges our clients have today. We collaborate to solve the challenges of tomorrow.