Challenges in areas ranging from education to the environment, gender to governance, health to housing don’t exist in a vacuum. Each month, Abt experts from two disciplines explore ideas for tackling these challenges in our monthly podcast, The Intersect. Sign up for monthly e-mail notifications here. Catch up with previous episodes here.
Be it outbreak, epidemic, or pandemic—human or zoonotic—the goal is to get reliable information on infectious diseases, quickly. Abt experts Laura Edwards and Sung-Woo Cho discuss strategies for harnessing machine learning and data visualization to provide actionable information—at scale—that can help stem a health crisis.
For more on this topic, check out:
- Episode 2: How Can We Eradicate Infectious Diseases Using Machine Learning?
- Episode 6: Defeating TB—Private Sector to Predictive Analytics
Read the Transcript
Eric Tischler: Hi and welcome to the Intersect. Today I'm joined by Laura Edwards and, making a second appearance on the Intersect, Sung-Woo Cho. Laura is an expert in international epidemiology, clinical research, and data architecture and management. Her research experience includes the flu, HIV, AIDS, Zika, and malaria. Sung-Woo oversees the Machine Learning Capacity building at Abt as well as our Data Science Fellowship, a company-wide initiative that trains staff members in machine learning programming. Thank you both for joining me.
Sung-Woo Cho: Thanks for having us.
Laura Edwards: Thanks for having us. Glad to be here.
Eric: When I first started discussing ideas for this episode, I was thinking we'd look at Abt's extensive work on the flu and, as we began planning, the flu season got pretty ugly. And then the coronavirus hit and there's talk of it being a pandemic. So, with all this in mind, Laura, I thought I'd lead off with this: what kind of data do we use to keep tabs on epidemics that could become pandemics?
Laura: Sure. So, maybe a good place to start would be to define pandemics, since maybe not all of our listeners would be familiar with that term. A pandemic occurs when a brand new virus, one that has never been seen before, emerges without warning and it spreads from person to person in an efficient and sustained way. So a pandemic happens when there is an ongoing epidemic on two or more continents. And pandemics often have global consequences, so, what's really important when considering respiratory virus pandemics—whether this is from influenza or another respiratory virus disease such as this novel coronavirus—it's really, really important to have surveillance systems in place that can detect the emerging virus as quickly as possible. Time is really, really of the essence when it comes to detecting pandemics.
Eric: When you say surveillance, you mean …?
Laura: So, when I say surveillance, I mean systems that generate public health data that help public health officials understand both existing and emerging infections. Our clients at CDC and Department of Health and Human Services—and others—have vast global networks of surveillance systems where they have dozens of in-country partners around the world with ministries of health or other local health institutions that can help generate data about potential pandemic emerging infections.
So, one of the reasons that pandemic flu is particularly tricky is that a person could become sick with the flu and contagious to others before they're even symptomatic. That's very different than most other infectious diseases, where a person is not contagious until they have the symptoms themselves. And this is one of the things about flu that makes it spread so easily. And this is also what we believe to be true about this novel 2019 coronavirus. Like I said earlier, time is of the essence when it comes to detecting and responding to pandemics. And if there's a way that we could start to detect pandemics before individuals who are symptomatic have contact with formal public health and surveillance systems, that could really change the trajectory of potential pandemic outbreaks.
Eric: Right. I know something that we've talked about people doing is medial listening and looking for text that might give us the jump on that information, then maybe we can spot trends before people are already showing up at the doctor's offices, and that made me think of things like natural language processing, which I know Sung-Woo knows a lot about. Sung-Woo, what do you think?
Sung-Woo: Yeah, thanks. So, I think with natural language processing, or NLP for short, it holds a lot of promise whenever you have a lot of text—like words and letters, paragraphs—that you're talking about, because it's really tough for a bunch of humans with their eyes and brains to look through hundreds of thousands of articles in a few minutes, for example. That's not really possible. So what NLP does is it extracts trends and themes and different meanings—within large amounts of text, especially. And so you're training an algorithm, a machine learning algorithm, to try to get at what the main themes are in a large amount of text.
So, for something like this where if, say, for example, newspapers or other media outlets are reporting on particular outbreaks in that given area, if you were to call all that information across these different areas, you could presumably use something like NLP to try to determine themes or try to predict where certain outbreaks could occur.
Laura: I know CDC monitors the news for potential pandemics. Sung-Woo, what kind of scale could we bring to this effort?
Sung-Woo: Yeah. I think that if we were able to use data scraping, like web scraping methods, to try to data mine, essentially, for information on text related to potential flu outbreaks, it could be several different newspapers in presumably different languages that we can then translate. So we're talking about, as a safe estimate—on a regular basis, say even weekly—it could be thousands of pages that we're talking about, of text that an algorithm could be trained upon to try to detect what themes or predicted outcomes that we're talking about. I think that that's in the realm of possibility. So thousands of pages instead of dozens or hundreds of pages that a human eye or human eyes would be trained to look through.
Eric: And use that in multiple languages, too.
Sung-Woo: Yes. We generally use Python as our language of choice when it comes to machine learning, and there are programs out there that we can use to translate languages and then use that translated text to run NLP algorithms on. And that's something that we've done already.
Laura: Could you define data scraping and data mining?
Sung-Woo: Yeah. So web scraping is, I think, the better term. So web scraping is a way in which we can get text information directly off of websites or things like tweets, anything that's floating around in the internet. There are web scraping techniques, these algorithms, that we can use to try to call all that text off of a website or a series of websites or a series of texts. So essentially what you're doing is you're just amassing all this text that's out there on the internet, text that's for public use on the internet.
Laura: So many novel viruses, including this new 2019 coronavirus, are zoonotic, meaning that they originated in animals and then they spilled over to people. And so they evolve from a virus that circulates primarily among animals to one that then can be transmitted in a human to human fashion. So because of that, is it possible to monitor agricultural news and web sources to try to catch any kind of spillover on the earlier end?
Sung-Woo: That's a really interesting idea to hone in first on agricultural, I don't know, trade magazines or newspapers that are all publishing digitally. If we can cull that information, get that text, and then review it on a regular basis, whether it's every few days or a week, we can use NLP to determine what the general themes are of all that text and then apply predictive analytics as well onto that text analysis to try to figure out where potential breakouts might be occurring. That combination of NLP and predicting some sort of numerical outcome, like “Yes or no, an outbreak is likely to occur,” that's also something that we're starting to do here and that's relatively cutting edge in our policy research industry, trying to combine text analytics with predictive analytics and trying to predict one-zero binary outcomes using a mass of text.
Eric: Very cool. I know that agencies are monolithic entities and they can sometimes have a hard time communicating amongst themselves, and I also know that we have a lot of experience building dashboards where we're sharing data among a lot of different entities, and we do it for HUD, we do it for EPA. I'm wondering—if we're talking about collecting all this data, for both human and zoological outbreaks—if creating a dashboard that we could share with other agencies might help bridge that gap. I was wondering if you think that would be useful, Laura, and if that would be feasible, Sung-Woo.
Laura: I definitely think it would be useful. Having data in real-time or close to real-time in a way that is visually easy to understand and to digest is really important when it comes to making decisions during the critical moments leading up to potential pandemic.
Sung-Woo: Yeah, and to follow that point, it would be really useful for any agency to have dashboards that are either real-time or close to real-time that people can use interactive features to try to figure out if you change one variable, how does another variable change, and do all that visually. I think that would be something that is of great use to a lot of agencies. It's something that we've already been doing, mostly through either Python or Tableau or D3. Those are some of the more useful data visualization platforms that are out there, but we're definitely starting to do that.
Eric: So maybe we could apply that at the intersection of data and infectious diseases.
Eric: And on that note, I'm going to say that's a podcast. [Laughs] Thank you both for joining me.
Sung-Woo: Great. Thanks.
Laura: Thanks for having me.
Eric: Sung-Woo and I were recorded live at Abt Studio 1 in Rockville, Maryland. Laura called in from our offices in Atlanta. For more on this topic, check out episodes 2 and 6 of the Intersect.