This page is optimized for a taller screen.
Please rotate your device or increase the size of your browser window.
Tackling Bias in Machine Learning and Child Welfare Data
January 9, 2021
Machine learning opens doors to exciting new research. But if not wielded properly, it risks reinforcing bias in data or introducing new bias.
In a recent quick-turnaround data science sprint, Abt researchers Nathan Greenstein, Ayesha Enver, Meaghan Hunt, Emily Roessel, and Sung-Woo Cho applied predictive machine learning algorithms to U.S. Administration for Children and Families (ACF) data. During this sprint, we focused on two issues. One was accurately predicting whether a child in foster care would receive a permanent placement such as reunification with family, adoption, or legal guardianship. We also were intent on thoroughly understanding and mitigating any bias that might exist in our data and model. This case study provided valuable insight into best practices for policy researchers to tackle the issue of bias in machine learning.
The first priority was to explore a promising application of machine learning: using federal data to predict child welfare outcomes, specifically, whether or not children exiting foster care achieved high-permanency placements.
To develop our model, we used data from the Adoption and Foster Care Analysis and Reporting System (AFCARS) from the ACF’s Children’s Bureau. AFCARS is a federally mandated system with case-level data on all children in foster care and those who have been adopted. Using AFCARS foster care files for years 2015 through 2018, we determined whether children discharged from foster care ended up in high- or low-permanency settings.
We then developed and tested a random forest model, which generates decision trees that collectively produce an ensemble prediction for high-permanency placements. Our model achieved consistently high performance: across key measures used commonly in machine learning – accuracy, precision, recall, and area under the curve – our model scored above 90%, indicating high-quality predictions.
Confident that our model performed well when predicting child permanency outcomes, we turned our attention to bias. In considering this, we asked two questions: In our data, are factors such as child race or disability status associated with disparate permanency outcomes? And Has our model introduced any new bias with its predictions? The issue of bias in machine learning can be intimidating, but our team found it helpful to break the process into four steps:
Understand your data’s bias
Choose a theoretical framework
Evaluate your results
In Step 1, the team investigated whether our data demonstrated bias along several lines: child race, caretaker race, child sex, and child disability status. We found low bias overall, but chose to focus on disability because while the disparity was in the expected normal range, it was the largest we observed. We measured this using disparate impact score, or the ratio of the probability of a positive outcome for unprivileged vs. privileged individuals (in our case, children with disabilities vs. children without). A score of 1.00 is perfect parity. We found a score of 0.91, meaning that children with disabilities were only slightly less likely to achieve a high-permanency outcome. Note that this does not represent an exhaustive review of bias in child welfare data, rather, a single finding related to a particular application of machine learning. Further study is needed to arrive at a holistic understanding of bias in this or any other field.
Having identified a disparity in the data that was relevant to our model, we moved on to Step 2. At this point, researchers must choose among numerous ways to conceptualize, define, and operationalize the concept of fairness. The particulars of this step could fill a blog post all their own, but, in essence, we turned to the literature on bias and considered the approaches that best suited our application. These decisions are critical, and they require a detailed understanding of the real-world use of the machine learning model in question.
For example, after finding a disparity in their original data, researchers could use a model to understand the disparity’s sources or to monitor the success of an effort to reduce it. In this case, researchers would not adjust their model to remove the particular bias since they want it to be reflected in their work.
On the other hand, imagine a foster care program in which children with predicted high-permanency placements qualify for desirable resources. Practitioners might want to avoid systematic predictions of lower-permanency placements for children with disabilities so as not to deny them those resources. In such a scenario, the goal of the model might change subtly. Accordingly, researchers could adjust their model to mitigate this bias – even though it existed in the original data – so that children with disabilities would not be excluded unfairly. This measure is not relevant to all machine learning applications, but our team chose to pursue it in the interest of conducting a comprehensive case study.
After choosing our theoretical framework, the team moved on to Step 3.
We researched and tested several freely available bias reduction algorithms, and we selected two that worked well with our data. We applied these algorithms to our modeling pipeline and moved on to Step 4. We found that they improved our disparate impact score: it rose from 0.91 to 0.97. This means that, while our model was initially reinforcing a small bias that was inherent in our data, we succeeded in re-tuning the model to shrink the disparity. We also evaluated whether our model was more accurate when making predictions about privileged individuals, which is a way that models can introduce new biases of their own. We found that this was not the case: we calculated our model’s accuracy ratio at 0.99, meaning that it was not meaningfully less accurate for children with disabilities.
This case study furthered our understanding of bias in machine learning and empowered us to address it in future work. Our efforts in the context of child welfare are applicable to many other fields, and they helped us develop procedures for detecting and mitigating bias in the context of machine learning. In a world that is rapidly adopting machine learning, the skills and confidence to meet the issue of bias head-on are essential to Abt’s mission.