Zhenke Wu, PhD
- Assistant Professor of Biostatistics
- Research Assistant Professor, Michigan Institute of Data Sciences (MIDAS)
1415 Washington Heights
Ann Arbor, MI 48109
Zhenke Wu’s research involves the development of statistical methods that inform health
decisions made by individuals. He is particularly interested in scalable Bayesian
methods that integrate multiple sources of evidence, with a focus on hierarchical
latent variable modeling. We have applied our methods to estimate the etiology of
childhood pneumonia, autoantibody signatures for subsetting autoimmune disease patients
and to predict whether a user is engaged with mobile applications.
Zhenke has developed original methods and software that are now used by investigators from research institutes such as US CDC and Johns Hopkins, as well as site investigators from developing countries, e.g., Kenya, South Africa, Gambia, Mali, Zambia, Thailand and Bangladesh.
Zhenke completed a BS in Math at Fudan University in 2009 and a PhD in Biostatistics from the Johns Hopkins University in 2014 and then stayed at Hopkins for his postdoctoral training. Since 2016, Zhenke is Assistant Professor of Biostatistics, and Research Assistant Professor in Michigan Institute for Data Science (MIDAS) at University of Michigan, Ann Arbor.
- PhD, Biostatistics, The Johns Hopkins University, 2014
- B.Sc., Mathematics, Fudan University, 2009
Research Interests and Projects:
I am motivated by the kinds of questions such as: “Given an individual’s current circumstances and preferences, what is her current and future health status, what are the options to prevent or treat her disease, and what are the benefits and risks of these options?” My research goals are to formulate and address these questions by formal statistical inferences that can integrate multiple sources of data, and to develop computational tools that can scale to big data.
Bayesian Hierarchical Latent Variable Models. Biomedical datasets are increasingly complex. My approach is to discover ways to compactly represent complex data by a relatively smaller number of biomedically-relevant latent variables. For example, the latent variable may indicate for each individual which mechanisms from a list caused their disease. The statistical goal is to discover simple latent structures that improve inferences about population parameters and individual latent states.
Robust Inference for Experimental and Observational Data. Health care policymakers evaluate whether a novel treatment is likely to improve population health. One useful measure is the average treatment effect (ATE), defined as the difference in average health outcome if all patients are assigned an old treatment versus if all patients are assigned a new one. Both, experimental and observational studies are used to provide evidence about ATE. My research is to develop novel methods robust to model misspecifications in both settings.
Open-Source Software Platform for Learning Health Communities. The overarching goal of the software platform is to provide an R-based environment comprising software tools to support the generation, management, analysis, and visualization of complex health data to support health decisions. I am currently working with colleagues at Johns Hopkins Individualized Health Initiative (Dr. Scott Zeger; http://hopkinsinhealth.jhu.edu), International Vaccine Access Center (Dr. Katherine O'Brien, Johns Hopkins University), Harvard Brigham and Women’s Hospital (Dr. Vincent Carey) and Group Health Research Institute (Dr. Yates Coley) on a Patient Centered Outcome Research Institute (PCORI) grant titled Bayesian Hierarchical Models for Design and Analysis of Studies to Individualize Healthcare.
For many health decisions, the intelligent acquisition and use of data can improve the chance of a successful outcome. One example is whether and how often to screen for common cancers where the potential harms should be considered along with the potential benefits, and the relevant information is increasingly complex, including digitalized images, DNA sequences, novel biomarkers, and multivariate time series from wearable devices, in addition to the more traditional clinical indicators of phenotype. Electronic health records (EHR) have made it possible to acquire and manage health information more effectively. They also enable Boolean-style (“if, then, else”) analyses. For example, if a newly recorded lab value is above a particular level, an EHR can automatically schedule a follow-up visit.
But in today’s information-rich environment, there is heightened need to define, measure, and track health state, to integrate traditional with more complex health measures, and to develop and use appropriate tools for analysis. EHR systems are an essential component to a health information system. However, for an EHR to benefit patients fully, it must be a component in a system that is designed to generate and then use health data to improve individual and population health: to frame key questions, to generate and integrate the relevant evidence, and to build, test, and continuously refine mechanistic or empirical (statistical) models that evaluate and communicate the evidence from the available data as evidence in health decisions.
Bayesian Hierarchical Models for Design and Analysis of Studies to Individualize Healthcare, Patient-Centered Outcomes Research Institute (PCORI).Pneumonia Etiology Research for Child Health (PERCH), Gates Foundation.
(Please see http://zhenkewu.com for recent papers)
- Wu Z, Casciola-Rosen L, Shah AA, Rosen A, Zeger SL (2017). Estimating AutoAntibody Signatures to Detect Autoimmune Disease Patient Subsets. Biostatistics. In press. doi: 10.1093/biostatistics/kxx061.
- Xu, G, Wu, Z and Murphy, SA (2018). Micro-Randomized Trial. In Wiley StatsRef: Statistics Reference Online (eds N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri and J. L. Teugels). doi:10.1002/9781118445112.stat08050.
- Fritsche L, Gruber SB, Wu Z, Schmidt E, Zawistowski M, Moser SE, Blanc VM, Brummet CM, Kheterpal S, Abecasis GR, Mukherjee B (2018). Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. American Journal of Human Genetics. In press. https://doi.org/10.1016/j.ajhg.2018.04.001.
- Wu Z, Deloria-Knoll M, and Zeger SL (2016). Nested Partially-Latent Class Models (npLCM) for Dependent Binary Data; Estimating Disease Etiology. Biostatistics. To appear. doi:10.1093/biostatistics/kxw037.
- Wu Z, Deloria-Knoll M, Hammitt LL, and Zeger SL, for the PERCH Core Team (2016). Partially Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65: 97-114. doi: 10.1111/rssc.12101.
- Frangakis CE, Qian T, Wu Z, Diaz I (2015). Deductive Derivation and Turing-computerization of Semiparametric Efficient Estimation. Biometrics. doi:10.1111/biom.12362. Discussion paper.
- Frangakis CE, Qian T, Wu Z, Diaz I (2015). Rejoinder: Deductive Derivation and Turing-computerization of Semiparametric Efficient Estimation. Biometrics. doi:10.1111/biom.12365.
- Wu Z, Frangakis CE, Louis TA, Scharfstein DO (2014). Estimating Treatment Effects in Cluster Randomized Trials by Calibrating Covariate Imbalances between Clusters. Biometrics, 70: 1014-1022. doi: 10.1111/biom.12214.