In partnership with the Harvard Global Health Institute, Google today released the COVID-19 Public Forecasts, a set of models that provide projections of COVID-19 cases, deaths, ICU utilization, ventilator availability, and other metrics over the next 14 days for U.S. counties and states. The models are trained on public data such as those from Johns Hopkins University, Descartes Lab, and the United States Census Bureau, and Google says they’ll continue to be updated with guidance from its collaborators at Harvard.
The COVID-19 Public Forecasts are intended to serve as a resource for first responders in health care, the public sector, and other affected organizations preparing for what lies ahead, Google says. They allow for targeted testing and public health interventions on a county-by-county basis, in theory enhancing the ability of those who use them to respond to the rapidly evolving COVID-19 pandemic. For example, health care providers could incorporate the forecasted number of cases as a datapoint in resource planning for PPE, staffing, and scheduling. Meanwhile, state and county health departments could use the forecast of infections to help inform testing strategies and identify areas at risk of outbreaks.
To create the COVID-19 Public Forecasts, Google says its researchers developed a novel time-series machine learning approach that combines AI with a clever epidemiological foundation. By design, the models are trained on public data and leverage an architecture that allows researchers to dive into relationships the models have identified and interpret why they make certain forecasts. They’ve also been evaluated to ensure predictions with respect to people of color — who have been hardest hit by COVID-19, with disproportionately high rates of cases and deaths — aren’t wildly skewed or otherwise misleading.
“We observe that our models produce meaningfully lower absolute error and normalized (relative) error as compared to the comparison model across predominantly African American, Hispanic, and white counties,” Google researchers wrote in a fairness analysis of the COVID-19 prediction models. “Our models optimize for high accuracy across all U.S. counties to provide the best overall forecast for most communities.”
The COVID-19 Public Forecasts are free to query in BigQuery as part of the service’s 1TB-per-month free tier or to download as comma-separated value files (CSVs). Additionally, they’re available through Google’s Data Studio dashboard and its National Response Portal.
All bytes processed in queries against the data set will be zeroed out, Google says, but data joined with the data set will be billed at the normal rate to prevent abuse. After September 15, queries over the forecast sets will revert to the normal Google Cloud billing rate.
The release of the COVID-19 Public Forecasts follows the launch of Google’s COVID-19 Public Datasets program, which hosts a repository of public data sets relating to the crisis and makes them easier to access and analyze. Corpora within the COVID-19 Public Datasets program includes the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE) data set, Global Health Data from the World Bank, and OpenStreetMap data, all of which are stored for at no cost on Google Cloud.