Challenges in areas ranging from education to the environment, gender to governance, health to housing don’t exist in a vacuum. Each month, Abt experts from two disciplines explore ideas for tackling these challenges in our monthly podcast, The Intersect. Sign up for monthly e-mail notifications here. Catch up with previous episodes here.
How can we rely on data to help address racial equality when it’s inherently biased by centuries of inequity? In this episode of The Intersect, Abt’s Laura Peck and Jason Brinkley look under the hood at AI, machine learning, and more to better understand how data analysis can address these conflicts rather than exacerbate them.
For more on this topic, listen to:
- Episode 17: Turning the Tide—Systemic Racial Inequities and the Social Determinants of Health
- Episode 15: Racial Bias, Data Science, and the Justice System
Read the Transcript
Eric Tischler: Hi, and welcome to The Intersect, I'm Eric Tischler. Abt Associates tackles complex challenges around the world, ranging from improving health and education to assessing the impact of environmental changes. For any given problem we bring multiple perspectives to the table. We thought it would be enlightening and maybe even fun to pair colleagues from different disciplines so they can share their ideas and perhaps spark new thinking about how we solve these challenges. Today I'm joined by two of those colleagues, Laura Peck and, making his second appearance on The Intersect, Jason Brinkley.
Laura is a nationally renowned policy evaluation methodologist who applies her expertise to program areas that include welfare, housing, education, income security, and employment. She currently serves as secretary of the Association for Public Policy Analysis and Management, and is editor of the American Journal of Evaluation's experimental methodology section.
Jason is a biostatistician, data scientist, and health researcher who leads the Research Design and Analytics team in Abt Data Science, Surveys, and Enabling Technologies division. He specializes in machine learning, customized data visualizations, maps, and dashboarding.
Jason Brinkley: It's great to be here.
Laura Peck: Thanks for having me.
Eric: Abt is a data driven company that advises governments around the globe on policies that are intended to promote the wellbeing of all people. Our mission tacitly acknowledges that there's bias throughout the world. So what happens when the data we rely on to craft solutions inadvertently reflect the systemic bias we all live with, and then technology such as AI and machine learning amplifies those biases? Laura, can you talk about these issues and the ramifications of having biases built into systems and evaluations?
Laura: Sure. Thank you for that question. We know that bias is inherent in the kind of data that we use in public policy research as you mentioned, Eric. The data that we collect to understand the world and the people in it reflects underpinning discrimination in society and longstanding inequalities. So, why would we think that artificial intelligence and related machine learning tools could rid the data of that bias? I think that if we want to eliminate bias from surfacing in the results of any data analysis, we need to be deliberate in that being a goal of the analysis.
Eric: Thanks. Can you give us some examples of how bias affects policies and programs that people rely on?
Laura: Sure. So let me give you an example. Let's say that we wanted to design a new variant of education and counseling for first-time home buyers, in an attempt to help them improve their home ownership outcomes such as getting the right kind of mortgage or preventing foreclosure. So we want to take a bunch of data to understand what factors contribute to people's success in the home purchasing process, so that we can craft an intervention that will target their strengths and weaknesses. For starters, having enough savings for down payment and having a good enough credit score to pass a lender's underwriting criteria, those are a couple of things that are important contributors to the home purchase process. And those two things alone, savings and credit score, they happen to be correlated with race in the US in a major way.
For example, average credit scores for whites are 734, for blacks, 677. So any kind of race-blind analysis that we might try to do is useless because that underlying data itself embeds elements of our deeply racist society. Then, if a data analysis recognizes race as a factor, it still might not be able to rid the analysis of racial bias because of the insidiousness of unmeasured factors that correlate with what we have measured in our data. And I think that AI algorithms have the potential to only reinforce those kinds of biases.
Eric: Those are some painfully potent examples. Jason, how do we fix it?
Jason: So I think Laura's right all the way down. This is something that we're running into. People aren't giving enough credit to the fact that some of these algorithms are still in a lot of ways in their infancy. We have given AI machine learning certain specific tasks, we've said, do these things, try to mimic human behavior, but we only really ask it to be fast and accurate. And when you lean in to just speed and accuracy as the only real criteria for performance, we ignore a whole space of other things. And I think that's what we're finding on the implementation side of this.
We develop these algorithms to help do our evaluation work, to help speed things up, to help take in complex data and complex associations and see what it can sort of predict out of all of this, but we really have put it in a tight fitting box in terms of what we're asking it to do. And we say, well, we want it to be fast. We want it to be accurate. And then we come back later on and say, well, sorry, we messed up. We didn't want it to be gender biased or racial bias, or have any of those other systemic issues that come out of it. And the math, I think it's an important thing to note, the math that we're talking through on some of this, the computer science developed in the algorithms weren’t working in a vacuum, they weren't designed to be racist to begin with. It was only when we introduce bias from the actual data, from the things that are actually happening out in the real world that these algorithms are picking up on that. They pick up on those biases and in some cases they lean into it in unexpected ways.
So what we find is we didn't really ask the algorithms to do this in a way that we should have, and now we're sort of suffering the consequences of it and having to kind of go back to the drawing board and think through different ways to do our evaluation to use computer algorithms, machine learning, to implement some of those things in consideration of some of these other features. And that's not just in housing, like what Laura was talking about. We have a lot of different examples in health when you're sort of assessing health risk scores, things of that sort. And you're thinking about people that are more likely to die if they don't get certain interventions. We certainly think of those to sort of say well, we definitely want to identify those folk. We think about this on the climate front, when we're thinking about the impact of environmental regulations and things like that.
And right now the world that we've kind of built for machine learning and AI to operate in is to reuse algorithms across all of these different things and all of these different ways and it just be fast and accurate. And now we have to really take a step back and say, well, we do want you to continue to be fast, we would like for you to continue to be accurate, but we also want you to be fair. We also want you to be considerate of the human aspects of some of the things that we're evaluating. And that's a much bigger hurdle for AI to try to cover.
Eric: So let me ask about those human aspects. Laura, getting back to you, what can we be doing on the front-end to help feed data that addresses these concerns?
Laura: Right. So I am a public policy analyst by training and an evaluator by practice. And that implies that, from my perspective, I elevate the role of design as a means to ensure that the results of any research are high quality. And in this context, we can think about what that implies for how we can rid analyses of bias, or at least not reinforce them. And generally, some designs are just better than others at uncovering causal program impacts, for example. And so it's my proposition that we think about using similar tools to evaluate the performance of AI algorithms. In the same way that we evaluate the effectiveness of public programs and policies, I think we want to set up to evaluate the effectiveness of these algorithms and not just on whether they have the greatest predictive success, but whether they also have a concurrent reduction in bias, for example.
So let's say that there's some algorithm that is 90 percent accurate in its prediction process. Well, if it reinforces or enhances biases, I don't care if it's 90 precent good. I'd rather have some algorithm be only 80 percent accurate if going along with it, it has a major reduction in infusion of bias, for example. So part of this process, I think what we need to do is figure out what is it that we are assessing in the end? Is it only predictive success or is it also the presence of bias and going along with that, we need to figure out how to measure and capture those outcomes so that we can evaluate the effectiveness of these algorithms on the outcomes that are our priority. And these are things, like Jason noted, the concept of fairness. It's not something that is inherent in a computer's brain, that's something that is inherently human, the whole concept of fairness and equity.
Eric: Right. And I was going to ask, do we have a sense of how we can capture bias on the front-end? Is that something that we are capable of doing early on in training people or what's the approach?
Jason: I think this is the direction the field is going. So I think that is both a positive thing and a negative thing. So let me sort of give both sides of that. The first is, I think Laura's totally right. As much of this on the front-end that we can capture as possible. And there's a lot of great work going on in the realms of something like fair AI, interpretable AI, humble AI. Those are all big buzzwords for alternative strategies to help on the design front to create algorithms that have less bias or at least capture some of it at the beginning or cognizant of it at the onset.
The design work is great, but we also have to be thinking about what to do on this backend. We currently don't have a great set of tools for helping to evaluate on the backend that the algorithm is racist or gender biased, or until we get ready to sort of deploy it out into the public, and that's really the worst place to sort of figure out what you developed has all of these biases. You need to know before you make large scale decisions on some of these things. And I would sort of push back to sort of say, look, we have a lot of scenarios right now. We have early adopters who are taking some of these algorithms, they've ran with them, they've implemented them. They've seen some great financial success, and then they've gotten penalized by the community for not being fair on the backend. But fair wasn't what we asked for. And so now they have to sort of turn around and evaluate this on the backend. And there are great tools in our evaluation tool kit for things like causality, like Laura had mentioned, in terms of going backwards in time to sort of say, okay, what can we attribute to some of these different causal indicators?
I think there's a lot of great work looking at the intersection of machine learning and causal inference, but a lot of it is on the front-end and where we really have a gap is the assessment on the backend. I don't think that we've done a really good job answering the question, “What do you do once you find out that your algorithm is racially biased?” I think a lot of companies are saying, let's just scrap it and start over, but a lot of companies are looking at it and say, okay, well, how can we measure that? How can we go in and actually look under the hood and fix it and do some of those things? And you got to have those metrics, those tools to go in and do some of that work. I mean, Laura, on the evaluation side, if you evaluate a program and you find that it has those biases, you have tools to sort of say, okay, this is some things that we might be able to do to address it, right?
Laura: I actually think that that's something that is a similar challenge in the public policy space. When we recognize that a program has uneven and inequitable impact, it's not always so clear what to do about it, but we recognize that we must do something about it and recognizing it is the first step. And at that point in public policy and program evaluation, we go back to the program's logic model. I think it's the same thing that you're thinking about here. Well, we want to go back to the algorithm. What is it about the logic model or the algorithm and its implementation and practice that results in these uneven outcomes and how can we tweak the program or overhaul the program to ensure that it generates results that are desirable and not inequitable in their application and result?
Jason: And that's the gap, right? The point is is that if you're evaluating a program and you find that it has these biases, that it's gender biased, that is biased racially, disabilities, services, there's a whole slew of avenues where these things can be biased. At least on the program level the instinct is what can we change about the program? What can we do with the program to make it better? And where we're at on the AI side right this second is, the field has not developed enough sophistication to be able to respond to some of that. And so right now a lot of the concern is, well, let's just scrap the algorithm and start over with a design-based process, but not everybody can do that. So I think the challenge for the community is what to do in those instances where you've got something that's working that works well for some things that has some of these biases that we want to try to come in and try to correct.
And we should also recognize, like Laura said, that those corrections are probably going to impact performance, and that shouldn't necessarily be a bad thing, but that's been the key metric for AI up to this point is how accurate it is. So if you accept that you're going to have a reduction in accuracy, but you're going to have something that's more fair, we have to have metrics that really balance those things in a way, because we put machine learning algorithms under the hood a different way than we would put a program in like a program that we're trying to evaluate that has... Because it doesn't really have those human components to begin with.
In a program, if you find that it has these biases, we would sort of upend it and say, let's end those parts that are detrimental to humans and not throw the baby out with the bath water. For AI, the solutions that I've seen so far have been, gut the whole thing, start everything all over. And there's a lot of groups that are reluctant to do that. And we just haven't figured out a good way to not throw out the baby with the bath water.
Laura: Right. And since this is a podcast, you can't see that I'm nodding, but I am. I'm agreeing with Jason here.
Jason: These challenges in sort of taking math and applying that sort of humanistic component and those other pieces to where it's more than the numbers. This is not new. I mean, it's new for AI, but when you sort of think about other areas like cost benefit analysis, this is a great place where when we do, like, a cost benefit analysis, it's very easy to put yourself in sort of a less humanistic role. You don't value human capital the same way. If you look at it from a business perspective, pollution might be good for business because it affects the bottom line, but it's not good for the overall system. It's not good from an environmental regulators perspective, it's not good for the health of society. It has all of these other places where it's bad.
And I think most people can look at that and see that we need regulation, environmental regulation and monitor pollution because if you were to just lean into cost benefit, you would say, well, we're going to do manufacturing practice is better, more cost-effective for a company. They might have unsafe labor practices. They might have unsafe environmental practices, but they're very good for the bottom fiscal line.
Machine learning is going through that same growing pain where we said, what was good for prediction, what was good for accuracy, was good for one measure, just like it might be good on a financial measure for cost benefit, but in all of these other realms, it would be terrible. It would be terrible to withhold a very expensive treatment to people that are dying or people that have cancer just because costs too much. We would never make those decisions. We typically don't make those decisions. However, where machine learning is right this second is that in some cases it has stumbled. It has created some of these biases. And we're not sort of taking that step back to say, well, we needed to put this thing on a larger spectrum, the same way we might do with something like cost benefit.
Eric: We talked about throwing the baby out with the bath water. We've talked about amending algorithms. How do programs know which approach they might want to take and how do we figure out what the next steps are? We're at a time right now where we have an opportunity to address biases that have, as Laura was saying, we're all saying, have been really baked into the system. How can we start turning the tide quickly and effectively?
Jason: We have to stop seeking out simple solutions for complex problems. So that's the first thing. We apply the math, and then in a lot of these instances, what we've been doing is letting the numbers drive all of the decision-making on this. And when we do that, we can only really do that in a scenario where we can be confident that the data driven decision-making is from good data. So we have to get better data for one.
And then two, we have to create better stress tests for evaluation designs and for machine learning algorithms. We have to put all of these things under the hood and not just in a one-off sort of thing. This has to be a continuous cycle of improvement in the same way we think about it in industry and in other places. We can always make some of these better. So it's not a one-off thing, you have to bake it into the overall evaluation. You have to be willing to get under the hood and you have to be willing to sort of set multiple outcomes for success. It's not just cost, it's not just benefit, it's the impact on human capital. It's the impact on environment. It's the impact on how those businesses are doing their business.
Laura: Well, I think that, for all of this, recognizing that a problem exists is the essential first step. We know that this is a problem. We know that our data are biased and we know that that algorithms have the potential to only reinforce those biases. And so now that we are paying attention to that, we can think about how best to use design thinking and to articulate and operationalize outcomes and consider the trade-offs, where we recognize it's a holistic set of conditions that result in a preferred algorithm. It's not just one single one, but a combination of metrics where we include issues of fairness, equity, as well as performance.
Jason: And that has to be done likely different for different industries. Right now we wouldn't apply the same evaluation model for housing that we might do for health or for environment. So we also have to recognize that the way we end up doing this sort of moving forward might be different for different scenarios.
Eric: Right. Great. Well, it's exciting that you guys are thinking about this, because it sounds like there's a mandate with the Biden administration to actually tackle these issues so I'm glad we're here discussing them now. So thank you both for joining me.
Jason: Thank you for having us.
Laura: Thank you for the opportunity to share these ideas together with you. And we do look forward to hearing people's reaction to this because this is partly just Jason and me complaining and musing, and it'd be really nice to hear other people's thoughts on what they are doing and their recommendations for how we can get this right.
Eric: If you'd like to leave comments, be sure to click through to the SoundCloud site and submit them there. We'll pass them onto Laura and Jason and hopefully we can get a broader conversation going. Thank you both again and thank you for joining us at The Intersect.