, Jacob Klerman and I argued that having administrative data available for answering a question about the impact of a program or intervention won’t be successful unless paired with a good research design. Here is an all-too-typical example of why relying on administrative data, even where it includes the primary outcome of interest, is insufficient when a participant’s entry into a program cannot be explained. Since my purpose is general and not about the particular study, I’ve anonymized its description.
Imagine an evaluation of a program that helps low-income individuals to complete college and move into good paying jobs with benefits. The hallmark features of the program are: substantial financial assistance; intensive, mandatory case management; and other support services. A crucial program requirement is that participants must commit to attending college full-time to obtain training in occupations that have high labor demand in the local economy. The program carefully screens applicants for their ability to make and stick to this commitment. The initial step for individuals who are not immediately able to pass college entrance exams is the completion of a nearly full-time program to improve their basic skills—300 hours of classroom developmental education in 12 weeks. So everyone in the program will have made the commitment to attend college full-time, or be able to do so after passing a 12-week remedial class.
The goal of the evaluation is to estimate the program’s impact on earnings, i.e., earnings with the program relative to earnings without the program, holding all else equal. The database on which the researchers rely contains the Unemployment Insurance quarterly earnings records from the large state in which the program is located. It also contains identifying information on the individuals who participated in the program as well as all individuals who registered for services under the state’s Workforce Investment Act (WIA). From the administrative data, the evaluators observe earnings with the program, i.e. for people who made the commitment to attend college full-time, or would be able to do so after passing a 12-week remedial class. The evaluation challenge is to approximate what earnings would have been without the program, for those same people. To address that challenge, the evaluators use propensity score matching (PSM), i.e., they attempt to select from the same administrative data a “control group” from among individuals who registered under the state’s WIA agency and received low-intensity services, and identify observed characteristics that predict program entry for those who did enter. Were they successful?
There is strong reason to believe they were not. In their analysis, PSM selects similar people who registered for services under the state’s Workforce Investment Act in that they line up with those in the program on approximately 20 characteristics, including age, race, ethnicity, prior earnings and engagement in various workforce activities. However, it seems unlikely that all of these people would have made the commitment to attend college full-time, or be able to do so after passing a 12-week remedial class. Someone’s capacity and willingness to commit to college full-time or attend an intensive 12-week remedial class is not measured in the administrative data base, and yet people’s educational and motivational levels are critical characteristics that surely influence outcomes. The inadequacy of this comparison group is made vivid by considering what would have happened in a randomized control trial of the program: individuals would first be determined to be eligible, and then randomly assigned either to treatment or control, ensuring that, just like the treatment group, the control group would be committed to and capable of attending college (or the remedial class) full-time.
The Temptations of Administrative Data: What to Avoid
Donald Rubin, the co-inventor of PSM, describes how to use the method properly.
He refers to the variables that were available to the decision makers that chose program entrants as “key covariates,” and asserts, “If the key covariates are very poorly measured, or not even available in the dataset being examined, it is typically a wise choice to look elsewhere for data to use to study the causal question at hand.” That is, where the data are lacking variables that can predict who would’ve been admitted to the program, no matter how good the data are on outcomes, the study will not produce reliable estimates of effects.
The researchers find that after some time, the program had large “impacts” on participants’ earnings. This does not seem surprising given that the evaluation design is comparing individuals who are uniformly committed and able to attend college full-time to those who are not. So the study’s attribution of large economic gains to the program is very likely to be spurious, derived from comparing a motivated and educated group of individuals to one that almost surely is less so. This is not to say that the program didn’t produce large gains — only that the evaluation does not produce credible evidence for it.
As this example shows, a major problem with using existing administrative data is that most often Rubin’s key covariates are missing. But because the data are available, and sometimes a lot of time and effort have gone into organizing them for research purposes, there is a strong tendency to ignore the question, “Can we accurately predict the likelihood of program entry among those who didn’t enter?” As Rubin puts it, “Often the dataset being used is so obviously deficient with respect to key covariates that it seems as if the researcher was committed to using that dataset no matter how deficient.” This is the dangerous temptation of access to administrative data: it can lure researchers, policymakers and program administrators into attempting to answer causal questions with insufficient data.
Rubin, D.B. (2008). For Objective Causal Inference, Design Trumps Analysis. The Annals of Applied Statistics,
Vol. 2, No. 3, 808-840.