Bridging the gap between noisy healthcare data and knowledge: causality and portability
University of Michigan School of Public Health
3755 SPH I, 1415 Washington Heights Ann Arbor, MI 48109-2029

Routinely collected healthcare data present numerous opportunities for biomedical research but also come with unique challenges. For example, critical issues such as data quality, unmeasured and mismeasured confounding, high-dimensional covariates, and patient privacy concerns naturally arise. In this talk, I present tailored causal inference methods and automated data quality control pipeline that aim to overcome these challenges and make the transition from data to knowledge. I detail the challenge of inconsistent “languages” used by different healthcare systems and coding systems. In particular, different healthcare providers may use alternative medical codes to record the same diagnosis or procedure, limiting the transportability of phenotyping algorithms and statistical models across healthcare systems. I formulate the idea of medical code translation into a statistical problem of inferring a mapping between two sets of multivariate, unit-length vectors learned from two healthcare systems, respectively. The statistical problem is particularly interesting because the training data is corrupted by a fraction of mismatch in the response predictor pairs, whereas classical regression analysis tacitly assumes that the response and predictor are correctly linked. I propose a novel method for mapping recovery and establish theoretical guarantees for estimation and model selection consistency Light refreshments for seminar guests will be served at 3:10 p.m.

Department of Biostatistics

Bridging the gap between noisy healthcare data and knowledge: causality and portability

Xu Shi, Ph.D., Postdoctoral Fellow, Department of Biostatistics - Harvard T.H. Chan School of Public Health

icon to add this event to your google calendarJanuary 10, 2019
3:30 pm - 5:00 pm
3755 SPH I
1415 Washington Heights
Ann Arbor, MI 48109-2029
Sponsored by: Department of Biostatistics
Contact Information: Zhenke Wu (zhenkewu@umich.edu & Peisong Han

Routinely collected healthcare data present numerous opportunities for biomedical research but also come with unique challenges. For example, critical issues such as data quality, unmeasured and mismeasured confounding, high-dimensional covariates, and patient privacy concerns naturally arise. In this talk, I present tailored causal inference methods and automated data quality control pipeline that aim to overcome these challenges and make the transition from data to knowledge. I detail the challenge of inconsistent “languages” used by different healthcare systems and coding systems. In particular, different healthcare providers may use alternative medical codes to record the same diagnosis or procedure, limiting the transportability of phenotyping algorithms and statistical models across healthcare systems. I formulate the idea of medical code translation into a statistical problem of inferring a mapping between two sets of multivariate, unit-length vectors learned from two healthcare systems, respectively. The statistical problem is particularly interesting because the training data is corrupted by a fraction of mismatch in the response predictor pairs, whereas classical regression analysis tacitly assumes that the response and predictor are correctly linked. I propose a novel method for mapping recovery and establish theoretical guarantees for estimation and model selection consistency Light refreshments for seminar guests will be served at 3:10 p.m.

Event Flyer for Bridging the gap between noisy healthcare data and knowledge: causality and portability