Something old, something new: Classical and modern approaches to addressing missingness in real-world data
Online in Zoom
Online in Zoom
Electronic health records (EHR)-derived data represent an enormous research resource with exposures and outcomes for a large and diverse population. However, EHR data have many limitations including complex patterns of missing data induced by the irregularity of interaction between patients and the healthcare system. Novel approaches to handling missing data including machine learning-based imputation methods have been touted as a potential solution to this problem, but evaluation of their performance in the context of real-world comparative effectiveness research (CER) is lacking. In the context of EHR-based CER, missingness can be handled in multiple ways. Multiple imputation (MI) can be used to impute variables with missingness prior to estimation of a propensity score. Alternatively, propensity score calibration (PSC) transforms this missing data problem into a measurement error problem. The PSC approach has potential to alleviate the computational burden of MI in large EHR databases. I will present a comparative evaluation of standard and novel methods to addressing missing data in the context of a real-world study of the comparative effectiveness of immunotherapy and chemotherapy for treatment of advanced urothelial cancer. Using plasmode simulation grounded in this context, we compare the performance of traditional and machine learning-based imputation methods as well as MI and PSC. We identify settings for missing data in which modern approaches have promise and those in which the greater flexibility of these methods potentially results in overfitting and poor statistical performance. I will conclude with reflections on how we can embrace the new while keeping the old in order to most effectively advance the scientific evidence-base with real-world data.

Something old, something new: Classical and modern approaches to addressing missingness in real-world data

Biostatistics Seminar with Rebecca Hubbard, PhD Professor of Biostatistics University of Pennsylvania

icon to add this event to your google calendarJanuary 27, 2022
3:30 pm - 4:30 pm
Online in Zoom
Contact Information: Mandi Larson, larsoma@umich.edu

Registration

Electronic health records (EHR)-derived data represent an enormous research resource with exposures and outcomes for a large and diverse population. However, EHR data have many limitations including complex patterns of missing data induced by the irregularity of interaction between patients and the healthcare system. Novel approaches to handling missing data including machine learning-based imputation methods have been touted as a potential solution to this problem, but evaluation of their performance in the context of real-world comparative effectiveness research (CER) is lacking. In the context of EHR-based CER, missingness can be handled in multiple ways. Multiple imputation (MI) can be used to impute variables with missingness prior to estimation of a propensity score. Alternatively, propensity score calibration (PSC) transforms this missing data problem into a measurement error problem. The PSC approach has potential to alleviate the computational burden of MI in large EHR databases. I will present a comparative evaluation of standard and novel methods to addressing missing data in the context of a real-world study of the comparative effectiveness of immunotherapy and chemotherapy for treatment of advanced urothelial cancer. Using plasmode simulation grounded in this context, we compare the performance of traditional and machine learning-based imputation methods as well as MI and PSC. We identify settings for missing data in which modern approaches have promise and those in which the greater flexibility of these methods potentially results in overfitting and poor statistical performance. I will conclude with reflections on how we can embrace the new while keeping the old in order to most effectively advance the scientific evidence-base with real-world data.