Subject-specific Functional Prediction from Electronic Health Records using an Enriched Dirichlet Process Mixture Model
University of Michigan School of Public Health
3755 SPH I, 1415 Washington Heights Ann Arbor, MI 48109-2029
In many modern applications, there is interest in predicting subject-specific functions of a variable over time. For example, we might want to know patientspecific trends in a biomarker over time. Modeling is needed If there is measurement error in the variable, or if gaps between data collection times is too wide. We propose a novel semiparametric model for the joint distribution of a continuous longitudinal outcome and the baseline covariates using an enriched Dirichlet process (EDP) prior. This joint model decomposes into subject-specific linear mixed models for the outcome given the covariates and simple marginals for the covariates. The nonparametric EDP prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. We predict the outcome at unobserved time points for subjects with data at other time points as well as for completely new subjects with covariates only. We find improved prediction over mixed models with Dirichlet process (DP) priors when there are a large number of covariates. Our method is demonstrated with electronic health records consisting of initiators of second generation antipsychotic medications, which are known to increase the risk of diabetes. We use our model to predict laboratory values indicative of diabetes for each individual and assess incidence of suspected diabetes from the predicted dataset. Our model also serves as a functional clustering algorithm in which subjects are clustered into groups with similar longitudinal trajectories of the outcome over time. Light refreshments for seminar guests will be served at 3:10 p.m. in 3755 Department of Biostatistics

Subject-specific Functional Prediction from Electronic Health Records using an Enriched Dirichlet Process Mixture Model

Jason Roy, Ph.D., Chair of Biostatistics and Epidemiology, Professor of Biostatistics - Rutgers University

icon to add this event to your google calendarSeptember 27, 2018
3:30 pm - 5:00 pm
3755 SPH I
1415 Washington Heights
Ann Arbor, MI 48109-2029
Sponsored by: Department of Biostatistics
Contact Information: Zhenke Wu (zhenkewu@umich.edu), Peisong Han (peisong@umich.edu)

In many modern applications, there is interest in predicting subject-specific functions of a variable over time. For example, we might want to know patientspecific trends in a biomarker over time. Modeling is needed If there is measurement error in the variable, or if gaps between data collection times is too wide. We propose a novel semiparametric model for the joint distribution of a continuous longitudinal outcome and the baseline covariates using an enriched Dirichlet process (EDP) prior. This joint model decomposes into subject-specific linear mixed models for the outcome given the covariates and simple marginals for the covariates. The nonparametric EDP prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. We predict the outcome at unobserved time points for subjects with data at other time points as well as for completely new subjects with covariates only. We find improved prediction over mixed models with Dirichlet process (DP) priors when there are a large number of covariates. Our method is demonstrated with electronic health records consisting of initiators of second generation antipsychotic medications, which are known to increase the risk of diabetes. We use our model to predict laboratory values indicative of diabetes for each individual and assess incidence of suspected diabetes from the predicted dataset. Our model also serves as a functional clustering algorithm in which subjects are clustered into groups with similar longitudinal trajectories of the outcome over time. Light refreshments for seminar guests will be served at 3:10 p.m. in 3755