Public-Health Prognosticators: Forecasting Public Health Events

Getting your Trinity Audio player ready...

Biostatistician Thomas McAndrew combines computational models and models of human judgment to build more accurate forecasts.

Kelly Hochbein

Illustration by Hvass&Hannibal

September 08, 2021

Academics

Before joining Lehigh’s College of Health faculty in 2020, biostatistician Thomas McAndrew was a postdoctoral fellow at the University of Massachusetts at Amherst, developing novel multi-model ensemble algorithms to forecast seasonal influenza for the Centers for Disease Control (CDC) and Health and Human Service regions across the United States. The algorithms he developed combined tens of forecasts into a single forecast of the percent of influenza-like illness to inform public health officials about what the future might hold, enabling them to make informed decisions.

“The idea behind probabilistic forecasting of disease transmission is to build a link between present data available about the disease and the probability of what will happen in the future,” McAndrew explains.

But different types of models have different advantages and disadvantages, he says. Computational models work well when data is plentiful, but are restricted by using only structured data. People, on the other hand, aren’t limited in that regard. They can obtain information from structured data like a spreadsheet, but they also can collect information from one another, from news and social media, and from personal intuition and experience.

And so McAndrew also gathers information from humans—public-health prognosticators who participate in online communities dedicated to accurately predicting future events. His main goal: combining computational models with models of human judgment to determine how people can contribute to building more accurate forecasts of the future, which can help experts determine the best ways to reduce the negative impact of a particular public health event.

“Every model has access to different amounts of information, and the more information you have, the higher potential to generate a more accurate forecast,” he says.

The Human Advantage

At different points throughout his career, McAndrew’s work has touched on genetics, cardiology, social media and infectious disease. In addition to his seasonal flu and COVID-19 work, he has been a research assistant in oncology and pancreatic cancer trials at the Lombardi Comprehensive Cancer Center in Washington, D.C.; a biostatistician focused on human papillomavirus (HPV) genetics and the relationship to cervical cancer at the Albert Einstein College of Medicine in New York; and an associate director of biostatistics at the Cardiovascular Research Foundation, also in New York, designing and analyzing novel cardiovascular clinical trials of stents and heart valves.

McAndrew describes his diverse pursuits as “taking all the balls and, instead of juggling them, just throwing them all up in the air.” However, as disparate as his research focus areas may appear, McAndrew says they are connected.

“They all have a common theme, and that’s building,” he explains. “I'm a builder. I build things. [At home] I have an outside workshop where I build bookcases, tables and shelves. And I have a computer inside that I sit in front of and use to build models, build data structures, and collect information. … All of my research is about building.”

McAndrew initially was drawn to the idea of asking humans to predict infectious disease transmission in Fall 2019. The challenge at that time, however, was that this approach would work best with less data—when data-hungry computational models are at a disadvantage—not more, and so it wouldn’t work well with seasonal flu, for which there exists a large amount of data.

Enter COVID-19, along with a remarkable scarcity of data at the beginning of the outbreak.

“COVID was not my specialty because it was no one’s specialty,” says McAndrew. In early 2020, he and his faculty advisor at UMass, Nicholas G. Reich, used a consensus of expert opinions to predict the early trajectory of the COVID-19 pandemic, including positive cases, hospitalizations and deaths.

As time went on, the experts became more and more accurate—and with limited data.

“We've found that people who are subject-matter experts, experts in the modeling of infectious disease, were able to make accurate forecasts of the cumulative number of deaths in 2020—38 weeks ahead of time, in March 2020,” he says. “... [We saw that] people can make long-term forecasts really well, probably because they're relying on their intuition.”

Now at Lehigh, McAndrew’s Computational Uncertainty Lab conducts research that he describes as “a blend of stats, data science and humanity, which is really fun.”

The team poses several public-health focused questions to groups of people who participate in online forecasting platforms, such as Metaculus and Good Judgment Open (GJO). These platforms offer opportunities for individuals to predict the probabilities of a wide range of real-world events, such as the behavior of the stock market, a country’s future political leader and the emergence of new technologies. Although anyone can make a prediction, McAndrew and his team do distinguish subject-matter experts—individuals with experience in public health, epidemiology, infectious disease modeling, and the like—from more casual participants.

“[These predictions are] based on structured, objective datasets, subjective information, information they found via some friends, maybe information they think they’ve picked up from the media. They build a probability distribution, and they submit it to us,” McAndrew explains.

McAndrew and his team post two sets of questions to the crowd. The first set of three questions they ask are the same questions that computational models also forecast the number of: incident confirmed cases, incident deaths, and pediatric and adult hospitalizations two weeks into the future. This allows them to compare predictions made by humans to the computational models. The second set of questions is determined by the “human” element.

“It's about really listening to the news, listening into what friends at the CDC are thinking and talking about,” McAndrew explains. “We look to make predictions about what might be important two to three weeks from now.”

One recent survey question—“How many incident-confirmed positive cases and COVID-19 in the US will occur at the end of the month?”—invited each participant to provide a probability distribution of what they think that number will be. McAndrew received 914 probability distributions. He took that information and combined it with computational models to build a forecast of cases in the future and shared it with the CDC and other public health organizations.

“What we're seeing is forecasts from humans are at least as accurate right now as forecasts from the computational models,” McAndrew says. “That said, we don't have enough proof to say that this will stick statistically, but so far I'm pretty shocked. I expected them to be worse. It's hard to make a forecast about the future, but so far the crowd is doing as well as the computational models. They have shown an ability to beat the machines—it’s this very David and Goliath sort of situation.”

‘People Are Still Useful’

McAndrew notes that people aren’t always right, and forecasts sometimes miss the mark.

For example, the team decided to start asking questions about vaccinations in June 2020, just as Pfizer and Moderna were ramping up their vaccine production in the United States.

“We thought this might be important for public health officials to know, so we decided to ask questions about the first dose and fully vaccinated folks in the U.S,” says McAndrew. “There was no objective process there—it just seemed important.”

The team released four surveys between June and September 2020, months before any vaccines would be released, asking participants questions about the number of vaccines that would be produced, when they would be approved, how long it would take to produce 100 million doses and distribute them, and how safe and effective the vaccines would be.

“We had sort of middling results. They were able to get the timing right, but not the efficacy,” says McAndrew. “People forecasted the vaccines to be something like 50 or 70% effective [and the Pfizer and Moderna vaccines are 95% and 94.1% effective, respectively]. So it highlights that forecasting can be hard. They're not amazing all the time. But it still provided the public with information about what could happen in the future.”

This humanity-influenced information about the future helps public health officials make better and more informed decisions. However, it also serves another, perhaps less obvious, purpose, McAndrew says.

“I think that there's an overreliance right now on modeling, and how modeling should drive our entire lives. I think my work speaks to that aspect—that people are still useful. I think that it hints at this idea that people are as important, and we still matter. We still have something to offer that computational models don't have access to.”