Ten years ago there wasn’t a pressing need for quantitative research methods to specialize in “big data.” Regression analyses didn’t need millions of observations and there was no way to accurately predict an outcome based on a large amount of historical data. So most policy researchers didn’t think much about the kinds of things machine learning can do: use vast computing power to analyze enormous amounts of structured (think spreadsheets) and unstructured (text or words) data.
Today the tides are slowly shifting. Now federal agencies are interested in machine learning applications, although not necessarily to answer causal questions. Present-day applications tend to fall into two categories:
- Making predictions of outcomes (usually using structured data, mostly in the form of readily collected administrative data).
- Making better sense of text-based data in a wide variety of formats.
The increased interest coincides with several key developments. First, policy researchers more frequently use machine learning-friendly languages such as Python and its myriad libraries. Researchers are now more adept at building on the great work that others have done by creating machine learning algorithms that are open-source and freely available to the public. Second, we are better equipped to use computers to process large datasets. Some public-use administrative datasets run in the multiple millions of per-person observations. With the aid of cloud-based computing, researchers can use their laptops to do work that would have required large server space a decade ago.
In addition to administrative data, we are beginning to see qualitative information as an ideal data source for machine learning. Research papers, open-ended survey responses and even web-scraped social media information are useful data sources in policy research work. A vast amount of this type of information is often lying around in hard-copy form, available for quick and cheap digitization and analysis.
In the past year, I have often found myself thinking that the advancements in data science have the potential to transform the policy research world in ways I never would have expected as a new graduate student in this industry 10 years ago. It is astounding that Abt has developed algorithms to predict college graduation outcomes, analyze thousands of public comments on mining activities and identify and count mosquito eggs using visual data. None of these applications would have been possible in our research policy world back in 2008.
In addition to our hands-on data science work, we’re further developing our capabilities from within through the new Abt Data Science Fellowship. Beginning in January 2019, the Fellowship will train a select cohort of staff from across the company in Python for machine learning purposes, with a focus on using natural language processing to analyze qualitative data. The evolution in the way we think about and analyze data in the future will strengthen our evidence-based approach in all of our lines of work and help us serve our clients better.
Read more in this blog series:
A Brief History of Machine Learning from a Policy Researcher’s Perspective