In ways unthinkable just a few years ago, machine learning combines massive computing power and often very large amounts of data to train and implement algorithms that can make individual-level outcome predictions and analyze large amounts of text information, such as public comments or journal articles. As a form of artificial intelligence, machine learning algorithms can help identify malignant moles as well as a veteran dermatologist can and play video games better than humans. While the areas of health, e-commerce and defense have been the largest users of machine learning, Abt Associates is exploring its use in the social policy arena.
Abt uses machine learning to do things that until recently would have been impossible, such as categorizing a vast amount of unstructured text into main themes or counting mosquito eggs from a smartphone picture using image recognition. Decision-makers can analyze the large amounts of data that result from a program evaluation in a variety of ways. For example, machine learning algorithms can dig through vast numbers of open-ended survey responses to help delineate the most important ideas or analyze spatial data to predict the onset of droughts. Such uses can cut the cost of program evaluations by analyzing data faster and more comprehensively than would be possible through purely human efforts. Machine learning also has the potential to change qualitative research by leaving time-consuming rote or repetitive tasks—such as reviewing and cataloging documents—to algorithms. That way researchers can spend more time on substantive thinking.
Abt uses large datasets, powerful computing and efficient programming to explore machine learning applications in the public policy domain and to transform social policymaking. We use the technique in “supervised learning” to predict outcomes and in “unsupervised learning” to provide unique insights into a document collection of any size. In supervised learning, the target outcome is clear, such as graduating on time. In these cases, we can use machine learning to predict outcomes with high accuracy, and we build recommender systems to provide the best steps toward achieving a desired outcome. In unsupervised learning, the data have no preconceived outcome, so the algorithm looks for patterns. Unsupervised learning can be particularly helpful in identifying main themes or opinions within a body of text and can examine anything from journal articles to social media posts.
Zika Africa Indoor Residual Spraying Project (ZAP)
Client: United States Agency for International Development (USAID)
Under the ZAP initiative funded by USAID, Abt is working to prevent Zika outbreaks by implementing, monitoring and evaluating mosquito-control activities in Latin America and the Caribbean. An essential aspect of the project is entomological monitoring or getting an understanding of the current mosquito population to help inform what the future population looks like. This includes all stages of the mosquito: eggs, larva, pupa and adult. This is a time-consuming and resource-intensive process. Teams of technicians need to count the number of mosquito eggs on a piece of paper, commonly referred to as ovitrap paper. This process becomes increasingly difficult when the team deploys hundreds of ovitrap papers across different locations and when each paper can contain hundreds of eggs. Seeing the need for a more efficient system, data scientists and software engineers at Abt built a mobile application to count the number of eggs on these ovitrap papers. The application relied on open source technology and will be left behind after the project so that local governments can customize it.
Predicting Transnational Organized Crime
Client: US Sentencing Commission
While transnational organized crime (TOC) affects millions of victims annually and has high societal costs, little research has been done on the individuals who facilitate TOC by providing infrastructure, logistical support or other aid. Using data from the US Sentencing Commission, Abt developed a predictive model for identifying facilitators of TOC. This involved linking individual offenders within cases using the case number, then sampling cases through a multi-stage process. We started with a stratified random sample, and then proceeded to sample cases that likely included TOC (adaptive sampling). For each individual sampled, we reviewed the Presentence Investigation Report to determine whether 1) the case actually involved TOC, and 2) if the individual was a facilitator of TOC. We tested both traditional econometrics models (OLS and logistic regression) and machine learning models (e.g., lasso, tree, random forest and general boosted logistic regression) with validation methods to model and predict TOC and TOC facilitation.
Validating Federal Community Supervision Risk Tools
Client: Probation and Pretrial Services Office (PPSO)
The late 2000s and early 2010s saw a renewed interest by pretrial and post-conviction community supervision agencies to improve tools that assess individual’s risk of re-arrest while on supervision in the community. To do so, a number of agencies created customized risk scoring tools that related criminogenic factors, such as criminal history, static and dynamic personal characteristics, and attitudes to re-arrest. For the federal Probation and Pretrial Services Office (PPSO), Abt used data from PPSO’s case management system and machine learning models, including bagging and boosted regression, to test the predictive power of the elements of PPSO’s Post-Conviction Risk Assessment tool.
Predicting Associate Degree Completion with High Accuracy
Client: Miami Dade College
Abt worked with Miami Dade College, one of the largest postsecondary institutions in the US, to use machine learning to predict students’ associate degree completion within four years of their first enrollment. Using only the administrative data that the college regularly collects, Abt developed and implemented a type of machine learning algorithm called a gradient boosting to predict the degree completion outcomes of 300,000 students over several cohorts. Abt was able to accomplish this with up to 93 percent accuracy. The analysis also found that certain academic course performance indicators and baseline assessment characteristics are better predictors of completion than general demographic characteristics, such as race or gender.
Career Pathways Descriptive and Analytical Study
Client: U.S. Department of Labor (DOL)
Abt is exploring possible applications of machine learning to advance knowledge about career pathways. As part of a contract with DOL, Abt will use machine learning to examine a large set of resumes to gather information on how individuals move through their careers. Abt also will analyze qualitative data from research reports to identify characteristics of career pathways programs. Additional possible applications include the analysis of research materials (e.g., interviews and notes) from Abt projects on career pathways to determine the main themes in the current body of research. Ideas for future use cases include web-based tools to address fundamental questions about career pathways training for job-seeking clients such as an interactive chat bot built into a job center website.
Using Machine Learning for Targeted Nonresponse Follow-up in Large Community Surveys
Client: Los Angeles Public Health Department and Texas Department of Transportation
Abt uses machine learning to target important subgroups for non-response follow up in mixed mode surveys. In situations where initial recruitment has not met expectations for certain subgroups, Abt (like many other survey vendors) uses external data to assist with targeted nonresponse follow-up. The external data often are incomplete and sporadic. The Abt difference is in the use of regional data (e.g., Census information) and machine learning to improve targeting. Abt relies on a combination of random forest models and simple neural networks to make such predictions, both of which can work well with incomplete data. Our clients in California and Texas are benefitting from better insights into where we can target follow-up to increase responses from under-represented or high-priority groups. Recently, we were more than 80 percent effective at identifying a specific subgroup of interest to one client for targeted follow-up. These algorithms need direction from subject matter experts to be most effective. So we combine machine learning with our vast knowledge of survey design and implementation to reduce the need for nonresponse follow-up, which is critical to achieving client goals.
President’s Malaria Initiative (PMI) VectorLink
Client: United States Agency for International Development (USAID)
Abt is working with USAID to minimize the threat of malaria by using a variety of prevention and treatment techniques. To gain deeper insight into spray operations and get more real-time information to get ahead of malaria, Abt developed a variety of applications to combat the disease. Recognizing that stakeholders at various levels want to know where spray operations have occurred, Abt built a chat bot to answer questions around where, how many and when homes were sprayed with insecticide. The chat bot works in both settings with connection to the internet and in unconnected locations. To get ahead of malaria, Abt aggregated disparate data sources--entomological, topographic, atmospheric and demographic--to build machine learning models to estimate insecticide resistance at different sentinel sites. With this knowledge, teams are better able to mobilize resources more quickly, deploy personnel more efficiently and study the impacts of their spray operations.