This page is optimized for a taller screen. Please rotate your device or increase the size of your browser window.

From Middle to Golden Age: What the Future of Evaluation Holds

November 23, 2015

I recently hit middle age:  with that birthday, and given my projections, I have as many years behind me as I have ahead.  In my professional life (I’ve spent the past 20 years as an evaluator of social welfare policies and programs), this landmark event has had me considering the future of our field.  At the annual fall conferences of the American Evaluation Association (AEA; in Chicago, November 11-14) and of the Association for Public Policy Analysis and Management (APPAM; in Miami, November 12-14), I recently had the opportunity to ask several scholars this question:  what does the future hold for the field of program evaluation?
First, Judy Gueron encourages us to celebrate how far we’ve come before looking forward.  Indeed, APPAM President and this year’s conference chair Ron Haskins dubbed the conference theme the “golden age of evidence-based policy.”  Today more than ever, practitioners and scholars choose to use evidence to support their decisions, but there is certainly more to do.
I heard our field’s luminaries flag the following as areas for future consideration:
Impact variation.  While we have figured out how to estimate the average treatment effect well, substantial interest in variation across kinds of people and contexts exists.
Big data. The proliferation of “big data”—and the possibility to learn about a near-infinite number of correlations in a wide variety of areas of social and commercial life—is not a substitute for well-designed evaluations that permit causal conclusions about program and policy effects.  Related, we know very little about the long-term effects of interventions, and the increased availability of data from administrative systems will permit future examination of those effects at relatively low cost.  Additionally, data availability implies that we might be able to develop better short-term proxies for longer-term outcomes, which would well serve evaluation research.
Quick & cheap versus longer-term. One of the more prominent tensions in the field right now involves the push for low-cost, quick turn-around experiments.  General consensus among the experts emphasizes that this type of evaluation should not come at the expense of longer-term and larger-scale evaluation efforts.  Indeed, both kinds of evaluation are important for learning, but of different types:  low-cost, quick turn-around experiments—often opportunistic in nature—tend to focus on administrative systems or behavioral responses to program/policy processes; whereas larger scale evaluations tend to focus on whole programs, often with longer-term follow-up and sometimes concerning outcomes that are more distinctive than can be measured in standard administrative data.  Both types of evaluation can be informative and should not be thought of as mutually exclusive.
Embedded learning.  Related to the prior point, technological advances are permitting organizations to embed evaluation into their ongoing processes more than ever before, with special opportunities to evaluate small/administrative changes as standard practice.  Possibilities for creating an “evaluation culture” exist, and it is some experts’ hope that this could permit program staff to feel “safe” to engage in learning about their programs and processes.  Ideally, managerial use of evaluation can lead to larger program evaluation in ways that make program staff both comfortable and eager to learn.
Knowledge accumulation.  One study in one place cannot provide proof.  The client-driven nature of evaluation has made it episodic and therefore not always theory-based.  Similarly A-B testing may result in putting improvements into practice immediately but without contributing to theory or adding to the knowledge base.  A goal for the future is to ensure that the substantial work we do—on innovations both small and large—contribute to the scholarly discourse as well as informing practice.
External validity.  New attention and prioritization considers how we can generalize one experiment’s findings to other populations and settings.  Future work should further consider the issue of temporal generalization, including how policy impacts interact with the business cycle.
Alignment of design and question.  The field of evaluation is large, multi-disciplinary and diverse in its practices and goals.  Because of this, evaluators’ large toolkit does not and should not include only hammers.  Looking forward, the field must achieve alignment of appropriate evaluation designs and methodologies with the questions being posed.
Professional development.  The quality of graduate training of and professional development among evaluators is better than ever; and in the future we should ensure that people are well-trained in the variety of methods—and when and how to apply them—that the field demands for executing high-quality research.  That is, policies and programs are not only nails (requiring hammers) but come in many other forms of hardware, requiring diverse tools for appropriate use, as the prior point emphasizes as well.
Other less evident future directionsSome scholars observed holes in current practice that they hope will be filled in the future.  General equilibrium effects that stem from experimental evaluations have not been a topic of widespread concern, but future research would be welcome.  In addition, financial concepts of risk and uncertainty could be better integrated into evaluation, including using existing ideas for how to use evaluation investments to improve society.  
In times of declining resources, evaluation is even more important.  With slim program budgets, evaluation can ensure that we are spending scarce resources wisely, on programs that are demonstrated effective, rather than throwing money away on ineffective ones.  Instead of cutting evaluation first, we should be in the practice of using evaluation to inform how to allocate resources rationally, based on evidence of effectiveness.  With as many as 90% of experimental evaluations showing that their programs/innovations do not work, Larry Orr concluded that we are not now in the “golden age” but will be when we have institutionalized rigorous evaluation so that we can get to retooling ineffective interventions sooner.  Government’s requiring evaluation of its investments is a start in that direction.  I am thrilled that I have half my life yet to help shape this future and thank my predecessors and mentors for pointing the way.
The fine print:  contributors to these summary conclusions come from three main conference sources:

  1. AEA Think Tank on “Where and How Do Experiments Fit Within the Field of Evaluation?” with Christina Christie (UCLA), M.H. Clark (University of Central Florida), Melvin Mark (Penn State University), Kathryn Newcomer (The George Washington University), moderated by Laura Peck (Abt Associates), November 11, 2015.
  2. AEA Topical Interest Group on the Design & Analysis of Experiments’ inaugural Business Meeting, with Will Shadish (UC-Merced), interviewed by Laura Peck (Abt Associates), November 11, 2015.
  3. APPAM Caucus “Rossi Awardees Discuss:  What’s Next for the Field of Evaluation?” with Howard Bloom (MDRC), Judith Gueron (MDRC), Rebecca Maynard (University of Pennsylvania), Larry Orr (Johns Hopkins University), Grover Whitehurst (Brookings), as well as Anna Jefferson and Michelle Wood (Abt Associates), Robert Lerman (Urban Institute), and Jon Spader (Harvard), organized by Laura Peck (Abt Associates) and facilitated by Kathleen Flanagan (Abt Associates), November 14, 2015.
Work With Us
Ready to change people's lives? We want to hear from you.
We do more than solve the challenges our clients have today. We collaborate to solve the challenges of tomorrow.