February 19, 2016
Senior Associate, Education
Laura R. Peck, PhD
Principal Scientist Program evaluators are often asked to determine whether a given program or program had its intended effects. Getting answers to the question “did it work?” remains the primary goal of many evaluations. But recently, many policy makers and program leaders also want to know “for whom and under what circumstances did the program work?”
Sometimes, the story is in the subgroups.
For example, women might benefit from a particular program more than men, or program impacts might be more favorable for those participants who had certain experiences before enrolling. Understanding how program impacts vary across subgroups is critical to helping policy makers and program administrators improve and target their services more carefully to those most likely to reap maximum benefits.
Within the context of an experimental design, if the subgroups of interest are defined by characteristics that are observed before the study entry (such as prior work experience) or do not vary with time (such as sex), these types of subgroup analyses are generally straightforward. But often we want to know about subgroups that are defined by post-random assignment behaviors, which are referred to as being “endogenous” subgroups. Here is where things can get tricky. For example, a school superintendent interested in implementing an after-school tutoring program might want to know about the effectiveness of the program for improving test scores among those students who, had they not enrolled in the program, would not have found some other source of tutoring. The evaluator can observe whether a control-assigned student found some alternative form of tutoring but cannot observe whether a treatment assigned student would have done so had he or she been assigned to the control group.
Insights regarding how program impacts differ for these endogenous subgroups have long been locked inside a seemingly impenetrable black box, frustratingly invisible to researchers and policy makers. A growing body of methodological research is attempting to address this long standing challenge. Although there are, as yet, no easy answers, a forum in a recent (December 2015) American Journal of Evaluation provides some useful tools evaluators might use to approach these types of questions. Growing out of a 2014 workshop sponsored by the U.S. Department of Health and Human Services’ Administration for Children and Families, Office of Planning Research and Evaluation, the AJE issue brings together the perspectives of many of the leading thinkers on these issues. The two authors of this blog post are honored to be among the contributors to this important volume, and here we highlight two specific approaches for opening up the black box to explore the relative effects of endogenous subgroups.
One approach for analyzing program effects on endogenous subgroups is referred to as the Analysis of Symmetrically-Predicted Endogenous Subgroups (ASPES). ASPES uses baseline data—which is exogenous (unrelated to) treatment assignment—to identify those endogenously-defined subgroups, keeping the analysis of some postrandomization choice, event, or milestone grounded in the strength of experimental design. The AJE special section article by Peck uses the example of the Moving to Opportunity (MTO) study (which provided housing assistance to help people move into lower poverty neighborhoods) to elaborate how ASPES identifies neighborhood quality as a mediator to MTO study participants’ later health outcomes: slicing the sample by their neighborhood residence offers a new perspective on how some of the “story is in the subgroups,” in this case showing that being in a better neighborhood for a longer period of time is associated with some better health outcomes. The plain experimental comparison eclipses this information, while the new analysis contributes value to the neighborhood effects literature as well as to policy decisions regarding the structure of housing assistance.
Principal stratification provides another tool to examine effects on subgroups that take shape after random assignment. As the AJE special section article by Page, Feller, Grindal, Miratrix and Summers elaborates, this approach is based on the idea that many of these seemingly endogenous subgroups that are defined by behavior after randomization are the result of pre-assignment membership in a subgroup whose composition can not be observed until after the experiment has begun. By examining post-randomization behavior along with a rich set of participant characteristics it is possible to estimate how treatment-assigned participants would have responded if they had been assigned to the control group and how control-assigned participants would have responded if assigned to the treatment group.
For example, the AJE article examines whether the impacts of enrolling in Head Start (a comprehensive preschool program for low-income families in the U.S.) differed based on the type of child care setting that children would have experienced absent the experimental offer to enroll in Head Start. Using principal stratification the authors estimated which control-assigned children would have enrolled in Head Start if assigned to the treatment, and among treatment-assigned children, who would have enrolled in either a non-Head Start center-based childcare program or some form of home-care if assigned to control. This allowed comparing treatment effects for these various types of families and showing that overall Head Start impacts were driven by impacts for those children who would otherwise have been in home care. These results have substantial implications for key policy questions regarding whether Head Start services are beneficial and to what types of families similar services might be offered in the future.
These two articles are parts of a larger effort to advance evaluation methodology and improve the tools that evaluators use to assess the effectiveness of programs. Informed program and policy decisions are driven both by evidence of the average treatment effect and evidence of how impacts vary across different groups. An overall negative or null finding may mask large favorable impacts to some subgroups and correspondingly large negative impacts for some other subgroup. These types of subgroup analyses can thus provide critical information to support program improvement and the refinement of policy. Strengthening the tools with which we might illuminate the black box of endogenous subgroups helps evaluators get beyond does it work, to the important questions of “for whom” and “under what circumstances” does it. Answers to these questions are help to support smarter policy and build more effective programs.