Can We Predict the Future? The Promise of Predictive Analytics and Recommender Systems
This is the promise of using predictive analytics and recommender systems, powered by machine learning, to help us solve policy research questions.
What is Predictive Analytics?
Predictive analytics is a broad term for any type of analysis that uses existing data to make predictions about future outcomes. In its simplest form, predictive analytics can use basic statistical techniques, where existing information is used to make “yes/no” predictions about a person’s future outcome, such as enrolling in college or taking a particular job. Predictive analytics can be more powerful still when sophisticated computer algorithms and large amounts of data are leveraged and applied to a whole host of future events. Insurance companies are increasingly using predictive analytics on large populations to determine risk assessments, as well as fraud detection and prevention.
What are Recommender Systems?
Recommender systems are a type of predictive analytics that offer recommendations about what might interest a person, what services might be most helpful, or what steps to take next. In our daily lives, we encounter recommender systems when our entertainment provider recommends a particular show, or when one of the various e-commerce vendors recommends something for us to buy. (How did they know that we would need nail clippers for our dog?!). These recommendations are based on data about our habits, backgrounds, and characteristics, as well as those of other similar individuals. Indeed, abundant data inform recommender systems that aim to improve our lives (and the financial well-being of companies selling to us).
What is Machine Learning?
Machine learning is a way to make these predictions and recommendations happen. The term is often used as a catch-all to describe computer algorithms that learn from data, meaning that the accuracy of the predictions and appropriateness of the recommendations improves as more data are available for inclusion in the algorithms. Machine learning can range from simple statistical models to advanced neural network algorithms that mimic the brain’s ability to make connections in order to process vast amounts of data and make decisions.
Predictive analytics and machine learning can converge, and the intersection can be revolutionary. For example, a large tech company is using high-resolution pictures of moles (often taken with smartphones) to create vast datasets from those pictures, which in turn, are processed through machine learning to predict the onset of skin cancer. The accuracy of these machine learning algorithms improves tremendously by using nuanced aspects of the shape and color of moles to determine which are cancerous. To the awe of many, some machine learning algorithms are better at their predictions than many highly trained and experienced dermatologists.
What Can Predictive Analytics Mean for Policy Researchers?
The above example in medicine provides a clear way in which predictive analytics can aid the work of experienced doctors. Similarly, the potential for using predictive analytics and recommender systems from a policy framework is enormous. Advancements in predictive algorithms and computing power can allow us to use predictions or recommendations to help federal, state, and local agencies tailor the services they provide, including targeting services to specific populations. What if we could predict major child abuse injuries and the services that would assist them from pictures of bruising? What if we could recommend specific training and public benefit programs for an individual transitioning out of homelessness and back into the workforce?
Predictive analytics can provide the tools for policy researchers to answer these types of questions, using administrative data that have already been collected, or by using a medium that will generate large amounts of data by breaking them down into manageable pieces (e.g., breaking down pictures into useable data). In this type of research, the focus is pointedly on the individual – can we predict future abuse for a specific child? Instead of making generalizations on a population based on a large set of data points, predictive analytics can provide a platform by which researchers can help agencies to directly help their individual clients, by generating predicted outcomes and recommendations for individuals based on the information provided.
In a similar vein, predictive analytics can also provide program administrators and staff with tools to help their clients in terms of which options are best in order to maximize a certain outcome. For example, we can envision a scenario in which a recommender system provides a job-seeking client with the optimal set of training and services, so that they can potentially maximize their wages over the next five years. Given enough data on the general job-seeking population, predictive algorithms can help an individual client map out an “optimal path” toward employment and wage accumulation. These recommendations can then be delivered through a portal that is easy to access, such as a smartphone.
In this sense, predictive analytics can help policy researchers go beyond research on “what works,” and provide something that is a blend of research and technical assistance. Evaluation research often provides broad brushstrokes when it comes to policy recommendations, as findings are often generalized and based on information gathered from a group of individuals.
Predictive analytics can turn this on its head by inherently focusing on the individual. Instead of providing findings for a group of individuals, predictive analytics can provide predictions and recommendations based on a single individual’s characteristics. These individual predictions and recommendations can also be aggregated to provide predictions and recommendations for specific populations of interest.
Predictive analytics can be viewed as research, in the sense that we are using data to make better observations and conclusions on individuals and subgroups, and in turn, to inform policies. However, it can also be viewed as providing direct services to individual clients, in the form of predicted outcomes or recommending the best options with available data.
Predictive analytics require a large amount of data in an effort to pull the most information out of the variations across individuals’ habits and characteristics. A rough rule of thumb: at least one million individuals and more than 50 different variables that describe the individuals. With few exceptions (e.g., Social Security or tax data), the data used in public policy research are not considered “big data” that meet these criteria.
However, we should anticipate that this will change in the near future, as the importance of data-based decisions grows; data collection becomes cheaper due to more streamlined processes and greater computing power; and access to new forms of data increases due to public and research interest (e.g., our relatively recent access to individuals’ wage data).
It is important that these tools should be viewed as evaluation techniques, especially because predictive analytics are not causal (i.e. identifying whether something causes outcomes to change) – which to be clear, they are not. We are predicting outcomes or providing recommendations based on large amounts of data, and not attempting to isolate a causal connection. Predictive analytics should be viewed as offering a complement to experimental evaluations, such as randomized control trials, not a substitute for them.
In spite of these limitations, there are insights that we can draw specifically from predictive analytics. A recommender system, for example, can provide insight into which types of services would be recommended to maximize a specific outcome. Researchers can provide these recommendations to clients at the individual level, adding some more specificity to impact evaluations.
Can computer code predict the long-term earnings of individuals? Can similar code help identify kids who are likely to drop out of school and need additional services? Or predict recidivism among recovering addicts, allowing programs to target services more effectively? Or perhaps match a set of employment services for people looking for work that has strong indicators of success?