Prediction Error & Model Evaluation for Space-Time Downscaling: case studies in air pollution during wildfires

Ann Arbor MI 10-22-2019 10-22-2019

ABSTRACT: Public Health Scientists use prediction models to downscale (i.e., interpolate) air pollution exposure where monitoring data is insufficient. This exercise aims to obtain estimates at fine resolutions, so that exposure data may reliably be related to health outcomes. In this setting, substantial research efforts have been dedicated to the development of statistical models capable of integrating heterogenous information to obtain accurate prediction: statistical downscaling models, land use regression, as well as machine learning strategies. However, when presented with the tasks of choosing between models, or averaging models, we find that our understanding of model performance in the absence of independent statistical replications remains insufficient. This lecture is motivated by several studies of air pollution (PM 2.5 and ground-level ozone) during wildfires. We review the basis for cross validation as a strategy for the estimation of the expected prediction error. As these performance measure play a crucial role in model selection and averaging we present a formal characterization of the estimands targeted by different data subsetting strategies, and explore their performance in engineered data settings. A final analysis and a warning about preference inversion is presented in relation to the a 2008 wildfire event in Northern California. BIO: Dr. Telesca is Associate Professor of Biostatistics at the University of California Los Angeles. He received a Ph.D. in Statistics from the University of Washington and spent two years at the University of Texas M.D. Anderson Cancer Center as a postdoctoral fellow. His research interests include Bayesian methods in multivariate statistics, functional data analysis, statistical methods in bio- and nano-informatics. Dr. Telesca is a member of the California NanoSystems Institute, the UCLA Jonsson Comprehensive Cancer Center and principal data scientist at Lucid Circuit Inc.

Environmental Statistics Day Lecture by Donatello Telesca (UCLA)

icon to add this event to your google calendarOctober 22, 2019
1:00 pm - 2:30 pm
1690 SPH I
1415 Washington Heights
Ann Arbor, MI 48109-2029

Sponsored by: Integrated Health Sciences Core of M-LEEaD (Michigan Center on Lifestage Environmental Exposures and Disease)
Contact Information: Meredith McGehee (mcgehee@umich.edu | 647-0819)

More Information

ABSTRACT: Public Health Scientists use prediction models to downscale (i.e., interpolate) air pollution exposure where monitoring data is insufficient. This exercise aims to obtain estimates at fine resolutions, so that exposure data may reliably be related to health outcomes. In this setting, substantial research efforts have been dedicated to the development of statistical models capable of integrating heterogenous information to obtain accurate prediction: statistical downscaling models, land use regression, as well as machine learning strategies. However, when presented with the tasks of choosing between models, or averaging models, we find that our understanding of model performance in the absence of independent statistical replications remains insufficient. This lecture is motivated by several studies of air pollution (PM 2.5 and ground-level ozone) during wildfires. We review the basis for cross validation as a strategy for the estimation of the expected prediction error. As these performance measure play a crucial role in model selection and averaging we present a formal characterization of the estimands targeted by different data subsetting strategies, and explore their performance in engineered data settings. A final analysis and a warning about preference inversion is presented in relation to the a 2008 wildfire event in Northern California. BIO: Dr. Telesca is Associate Professor of Biostatistics at the University of California Los Angeles. He received a Ph.D. in Statistics from the University of Washington and spent two years at the University of Texas M.D. Anderson Cancer Center as a postdoctoral fellow. His research interests include Bayesian methods in multivariate statistics, functional data analysis, statistical methods in bio- and nano-informatics. Dr. Telesca is a member of the California NanoSystems Institute, the UCLA Jonsson Comprehensive Cancer Center and principal data scientist at Lucid Circuit Inc.

Event Flyer for Prediction Error & Model Evaluation for Space-Time Downscaling: case studies in air pollution during wildfires