Look to Michigan Biostatistics Student Research Showcase

March 20, 2026 | 2:30 - 4:00 p.m.

School of Public Health Building I, Room 1680 (Cornely Community Room)

The Department of Biostatistics at the University of Michigan welcomes current community members, admitted students, and the general public to attend this showcase of outstanding student research. Students will present their work, via a poster session and/or short talks, on a wide range of topics including:

Spatial, imaging, and network modeling in cancer biology, neuroimaging, and precision medicine
Bayesian methods and adaptive clinical trial design, including patient-preference and dynamic treatment studies
Causal inference and real-world evidence using large healthcare databases
Machine learning and AI methods grounded in statistical rigor, with theoretical guarantees and scalable algorithms for high-dimensional biomedical data
Computationally efficient approaches for analyzing massive imaging, genomic, and clinical datasets

A full list of tentative presenters can be found below.

Light refreshments will be served to all attendees. To ensure accurate headcounts, please submit your RSVP at the link below.

Student Presenters

Jessica Aldous

Multi-Resolution Spatial Regression Analysis of Cellular Colocalizations in Cancer Imaging

Changes in immune-tumor cell colocalization across the epithelial to mesenchymal transition (EMT) in renal cell carcinoma can reveal differences in immunotherapy sensitivity. Hierarchical multiplex imaging data collects cellular protein expression within spatially organized fields of view (FOVs) sampled across many patients' tumor biopsies and can help identify global biomarker effects on disease progression. However, existing statistical methods cannot account for this hierarchical spatial data structure. To this end, we propose the multi-resolution spatial regression analysis of cell colocalizations in cancer (MoSAIC) model that accounts for the spatial relationships between FOVs and the high degree of both within- and between-patient variability. MoSAIC is a hierarchical Bayesian model that decomposes tumor gradient effects into global and patient-specific effects, and employs a Gaussian process to model the spatial relationships between FOVs. Complex global tumor gradients are estimated via penalized splines, and simultaneous band score probabilities are used to assess the significance of estimated curves. Simulations reveal MoSAIC has improved prediction and model fit compared to existing spatial and non-spatial model alternatives. MoSAIC identifies changes in colocalization of Cytotoxic T cells and tumor cells, as well as Macrophages and tumor cells with increased expression of EMT markers N-cadherin and Program Death-Ligand 1, respectively. Such differences in the TME across the EMT gradient may explain the increased aggressiveness and susceptibility to immunotherapy of mesenchymal renal cell carcinoma tissue.

Xingran Chen

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation

Predictions from machine learning (ML) models are increasingly used to impute missing data, but their naive use risks biased inference. Existing methods provide valid inference with ML imputations regardless of prediction quality to enhance efficiency, while they are limited to simple missing data structures under a missing-completely-at-random assumption. We develop a novel framework for valid statistical inference in Z-estimation problems using ML imputations under a missing-at-random assumption and for general missingness patterns. The method stratifies data by distinct missingness patterns and constructs an estimator by appropriately weighting and aggregating pattern-specific information. We establish the asymptotic theory and provide a theoretical guarantee on efficiency dominance over weighted complete-case analyses (WCCA). Practically, the method affords simple implementations by leveraging existing WCCA software. Extensive simulations are carried out to validate theoretical results. An analysis of \textit{All of Us} data further shows the practical utility of the method. The paper concludes with a brief discussion on practical implications, and potential future directions.

Liz Davis

Heterogeneity-Adaptive Meta-Analysis

Meta-analytic methods tend to take all-or-nothing approaches to study-level heterogeneity, assuming all studies are heterogeneous or homogeneous, leading to inefficiency and/or bias in estimation and inference. In this paper, we develop a heterogeneity-adaptive meta-analysis in linear models that adapts to the amount of information shared between datasets. The primary mechanism for the information-sharing is a shrinkage of dataset-specific distributions towards a new ""centroid"" distribution through a Kullback-Leibler divergence penalty. The Kullback-Leibler divergence is uniquely geometrically suited for measuring relative information between datasets, and leads to relatively simple closed form estimators with intuitive interpretations. We establish our estimator's desirable inferential properties without assuming homogeneity of dataset parameters. Among other results, we show that our estimator has a provably smaller mean squared error than the dataset-specific maximum likelihood estimators, and establish asymptotically valid inference procedures. A comprehensive set of simulations highlights our estimator's versatility, and an analysis of data from the eICU Collaborative Research Database illustrates its performance in a real-world setting.

Stefan Eng

Causal Network Discovery using Mendelian Randomization

Genome-wide association studies (GWAS) have transformed our understanding of human health by identifying genetic variants associated with traits and diseases. Thousands of GWAS have been conducted across many traits and populations, and the resulting summary statistics describing associations between genetic variants and traits are often shared publicly. This has enabled other researchers to perform new analyses without accessing the original genetic data.

One method that makes use of GWAS summary statistics is Mendelian Randomization (MR). MR uses genetic variants as natural experiments to estimate whether one trait has a causal effect on another. Network Mendelian Randomization extends these ideas to study causal relationships among multiple traits at once. We introduce Network Empirical Shrinkage Mendelian Randomization (NESMR), a method that estimates causal networks of traits using GWAS summary statistics. NESMR uses a computational efficient approach to efficiently estimate the model, which allows us to explore many possible network structures. We develop a graph discovery algorithm to identify the network that best explains the data.
We apply NESMR to traits related to coronary artery disease (CAD). The resulting network agrees with findings from clinical studies and biological understanding of CAD risk. Importantly, the method also recovers known temporal patterns, placing childhood body mass index (BMI) early in the network and CAD as a downstream outcome.

Sarah Ferlito

Bayesian Dynamic Borrowing Approaches for Incorporating Patient Treatment Preferences in SMART Designs

Partially Randomized, Patient Preference, Sequential, Multiple Assignment, Randomized Trials (PRPP-SMARTs) are multi-stage clinical trial designs that allow participants to receive their preferred treatment or to be randomized to one of two treatment options across two critical decision points in the course of care. These designs provide valuable data on dynamic treatment regimens (DTRs) or adaptive treatment algorithms especially in settings where recruitment and retention may be difficult due to participant treatment preferences (e.g., treatments are of different modalities or prior treatment histories). Previous methods to analyze PRPP-SMART data pool across participants with treatment preference and those who are randomized using weighted and replicated regression models (WRRM). Here, we extend Bayesian dynamic borrowing (BDB) methods, which are more commonly used, for example, in basket trials to share information across subgroups or in rare disease trials to borrow information from external sources, to the PRPP-SMART setting. We apply three BDB methods to PRPP-SMART data to share information across participants with different treatment preferences when estimating DTRs: a standard Bayesian hierarchical model (BHM), an exchangeability non-exchangeability model (EXNEX), and an approach using Dirichlet process mixtures (DPM). We conduct a simulation study to compare the frequentist properties of WRRM with the BDB methods across preference rates and across small to large differences in average outcomes between participants based on treatment preferences, referred to as small to large effect size scenarios. We find that DTRs are estimated with negligible bias across approaches for PRPP-SMARTs in small effect size scenarios. WRRM is over 15% more efficient in estimating each DTR than the BDB methods across preference rate scenarios. BHM and DPM have similar efficiency and have significant bias reduction compared to WRRM in moderate to large effect size scenarios. We observe minor improvements in efficiency for EXNEX compared to no data borrowing. We recommend the BHM approach to share data across all participants when estimating DTRs in a PRPP-SMART due to the complexity of the EXNEX and DPM approaches, although these methods may be more appropriate in future pragmatic SMART designs.

Rachel Gonzalez

Impact of morning versus afternoon infusions on overall survival in metastatic non-small-cell lung cancer

Circadian biology suggests that synchronizing immune checkpoint inhibitor (ICI) dosing with morning peaks in immune activation could improve clinical outcomes, but well-powered studies using appropriate causal inference methodology are sparse. We emulated a pragmatic randomized control trial of AM vs. PM ICI infusions using Veterans Health Administration records from 2010 to 2024. In the target trial, stage IV non-small-cell lung cancer patients planned to undergo first- or second-line ICI would have been randomized to receive the first 3 infusions in the AM (<12:00PM) or PM (>=12:00PM). In our emulation, date of randomization was approximated as the date ICI therapy was ordered in the electronic health record. Protocol deviations were defined as receiving one or more of the first three infusions outside the assigned time window, or failure to receive at least one infusion within 30 days of the ICI order. The primary outcome was overall survival (OS). We collected comprehensive baseline data on a range of factors that could influence whether patients received morning versus afternoon ICI infusions. Patient-level variables, cancer-related variables and treatment-related factors were all captured. To account for changes in clinical status during the exposure period, we also included inpatient versus outpatient status as a time-varying longitudinal covariate, capturing hospitalizations that may influence infusion timing between enrollment and the third infusion. The per-protocol effect was estimated via weighted Cox proportional hazards regression following the clone-censor-weight procedure. This marginal structural modeling approach uses inverse probability of censoring weights to account for baseline and longitudinal confounding. Standard errors and 95% confidence intervals were computed using robust variance estimators. Sensitivity analyses were also conduced varying the time cutoff (11AM, 12PM, 1PM), number of infusions (1, 2, 3 or 4 infusions), and analysis method (landmark analysis). A historical chemotherapy cohort served as a negative control. 4,688 patients were eligible for the emulated trial; of these, 1,171 received their first three infusions in the AM and 794 in the PM. Median follow-up was 4.7 years. Median survival was 10.3 months (AM) vs. 8.1 months (PM). PM dosing was associated with worse OS (hazard ratio [HR] for PM vs. AM 1.15, 95% CI 1.04-1.26, p=0.004). At 12 months, the estimated absolute improvement in survival probability in the AM arm was 5.9% (95% CI 1.2-10.6%). In 7951 chemotherapy controls (2,072 AM; 1,289 PM; median follow-up 8.9 years), no time-of-day effect was detected (HR for PM vs. AM 1.05, 95% CI 0.98-1.12, p=0.15). Results were robust in sensitivity analyses. Morning ICI infusions confer a modest but clinically meaningful survival benefit that is absent in chemotherapy controls, supporting a causal chronotherapeutic effect. Scheduling ICIs before noon represents a low-cost, immediately actionable strategy warranting prospective confirmation.

Ralph Jiang

Scalable Image-on-Scalar Regression for High-Dimensional Neuroimaging Data

Image-on-scalar regression is a statistical method for modeling the association between scalar covariates and an image outcome, which takes on different values at different spatial locations. Although there currently exist many methods to solve this class of problems, the time and memory required to apply these methods in practice scale poorly with the sample size and the number of voxels in the image, both of which are typically very large for questions of interest. We propose a new method, the Scalable Image-on-Scalar Regression Algorithm, (SIRA), for solving image-on-scalar regression problems with a specific focus on maintaining computational scalability in both the sample size and the image size by enforcing homogeneity and sparsity in the spatially varying coefficient solution to this problem. We compare the performance of SIRA to other existing image-on-scalar regression methods on simulated data and demonstrate superiority of SIRA in terms of numerous metrics. We also provide an application to real world Functional Magnetic Resonance Imaging data from UK Biobank, which is far beyond the scale that most other image-on-scalar regression methods can reasonably handle.

Neo Kok

Impact of Missing Data and Monitoring Duration on Downstream Analyses in Continuous Glucose Monitoring

Consensus guidelines recommend at least 14 consecutive days of CGM monitoring with 70% completeness to represent 90-day glycemic exposure. This study quantifies bias and uncertainty introduced into downstream analyses by using CGM metrics from incomplete or reduced monitoring, relative to a 90-day complete profile. Based on 1,010 complete 90-day CGM profiles from individuals with type 1 diabetes, we simulated incomplete profiles by varying monitoring duration (7-90 days) and data completeness (10%-100%). Consensus CGM metrics were computed on both incomplete and complete profiles to quantify measurement error. This error was propagated into two downstream regression models: (a) CGM metric is an outcome for a binary treatment (clinical trial setting); (b) CGM metric is an explanatory variable (covariate) for another continuous outcome. Bias was quantified using observed-to-true effect size ratios, and uncertainty by the sample size increase required to maintain precision. In the clinical trial setting, treatment effects remain unbiased but lose precision; for Time In Range (TIR), 14 days required at least 16% more participants versus 90 days. When the CGM metric is a covariate, associations with outcome are attenuated (biased towards zero, up to 14% for TIR) and less precise. Representing 90 days of glycemic exposure with 14 days can lead to bias and loss of precision in downstream analyses. We recommend study protocols require at least 30 days of CGM monitoring with 70% completeness. If 30 days is not feasible, studies should plan for increased sample sizes.

Tom Liu

Total Variance Regularization for Spatially Coherent Cell-Type Deconvolution in Spatial Transcriptomics

Spatial transcriptomics experiments measure gene expression at thousands of spatial locations, but most platforms capture mixtures of cell types per spot. Existing deconvolution methods can yield noisy, spatially fragmented cell-type maps that conflict with tissue architecture. We develop spatialTV, a spatially coherent deconvolution framework that couples an RCTD-style likelihood with graph total-variation (TV) regularization over the tissue adjacency graph. To preserve sharp biological boundaries while suppressing speckle, we use a weighted soft-L1 TV penalty (Huber/Moreau-smoothed TV with parameter delta), where edge-specific weights w act as shrinkage hyperparameters controlling how strongly each boundary is penalized. Optimization is carried out via an alternating proximal gradient scheme: we update the mixture proportions X (simplex-constrained per spot) and cell-type signatures B using a projected FISTA solver with backtracking line search, gradient-based restart, and a corrected Lipschitz bound that incorporates ||A^T A||_2, max(w), and delta. We then update w through a stabilized, empirical-Bayes–motivated constrained reweighting step (solved as a linear program), enabling adaptive, edge-aware smoothing without changing the underlying graph. Under standard assumptions (Lipschitz differentiable data-fit term, proper lower-semicontinuous regularizer, and the KL property), the resulting monotone proximal alternating procedure admits a global convergence guarantee to a critical point, with finite-length iterates when sufficient decrease is enforced. The implementation includes cached spectral norms and OpenMP parallelism to scale to high-resolution tissues.

Ziyu Liu

Bayesian Image-on-Image Regression for Linking Resting-State Connectivity to Task fMRI Activation

Functional MRI (fMRI) data provide rich insights into the brain's functional organization. Understanding how resting-state functional connectivity is associated with task-evoked activity is an important problem in neuroimaging. To study this association, we propose an image-on-image regression (IIR) model, where predictor and outcome images are represented with fixed spatial bases and link them through low-dimensional basis-to-basis coefficient matrices that provide interpretable and spatially coherent mappings between modalities. For cortical surface outcome images, spherical harmonics capture smooth spatial variation. Posterior inference is performed using an efficient Markov chain Monte Carlo algorithm enabling scalable computation for high-resolution imaging data. Applied to the Adolescent Brain Cognitive Development (ABCD) Study, our method reveals coherent spatial associations between resting-state functional connectivity and the activation from a working memory task fMRI study with interpretable estimation and uncertainty quantification.

Longhao Pang

Hierarchical Bayesian Framework for Scalar-on-Network Regression with Random Effects

We propose a hierarchical Bayesian framework for scalar-on-network regression with random effects to understand brain structure based on P300 electroencephalogram (EEG) signals. Our framework accounts for subject-specific variability through random effects. The framework consists of an EEG signals network manifold learning model and a predictive model for clinical outcomes. The EEG signals network manifold learning model includes a decomposition model and a random effect model. The decomposition model extracts subject-specific network bases and subject-specific network features from the observed EEG signals. The random effect model links the subject-specific signal bases through a shared underlying network basis on Stiefel manifold. For posterior computation, we apply a Gibbs sampler with stochastic annealing. Our model not only captures the shared underlying network basis accurately with flexibility through the random effects but also provides reliable predictions of clinical outcomes based on meaningful subject-specific network features. The effectiveness of our method is demonstrated through simulation studies and brain-computer interface (BCI) datasets from participants with amyotrophic lateral sclerosis (ALS).

Leyuan Qian

Smooth tensor decomposition for ambulatory blood pressure monitoring data

Edward Shao

Fast distance computation of multivariate distributions via nonparanormal transport

With the increasing availability of data objects in the form of probability distributions, there is a growing need for statistical methods tailored to distributional data. Distance measures, especially the pairwise distance matrix between data objects, provide the foundation for a wide range of modern data analysis methods, such as clustering, multidimensional scaling, and distance-based regression, among others. The Wasserstein distance is commonly used with distributional data due to its compelling optimal transport property. However, while the Wasserstein distance can be efficiently computed for univariate distributions, its application to multivariate distributions is limited due to high computational costs. To address these scalability issues, we introduce the Nonparanormal Transport (NPT) metric, a closed-form distance based on the flexible nonparanormal distribution family for modeling skewed and non-Gaussian multivariate data. Simulation studies demonstrate that NPT maintains a high level of agreement with the Wasserstein distance, while being nearly at least 1000 times faster when computing a 100-distribution pairwise distance matrix in both 2 and 5 dimensions. We illustrate the utility of NPT through a multidimensional scaling analysis of bivariate oxygen desaturation distributions of 723 individuals with sleep apnea in the Sleep Heart Health Study.

Yanruyu Zhu

Hanna Venera

Zero inflated Outcomes in SMART Designs

A Sequential, Multiple Assignment, Randomized Trial (SMART) is a clinical trial design for developing and comparing dynamic treatment regimens. Standard SMART analyses often use weighted and replicated regression. In non-SMART studies with count outcomes, especially in settings of substance use, zero-inflation is a common issue.

One solution for zero-inflated longitudinal outcomes is the two-part hurdle model (HM). This model handles random zero cases where subjects have zero outcomes but remain at risk. HMs have not yet been applied to SMART data.

We propose a two-part HM for zero-inflated, longitudinal count outcomes in SMARTs. The model combines logistic regression for zero/nonzero outcomes with a truncated Poisson model for nonzero counts. Our approach is motivated by and applied to the SafERteens M-Coach SMART, which studied investigated the effects of brief interventions and text messaging on young adults on alcohol consumption outcomes.

Yanruyu Zhu

Bayesian Data Augmentation for Increasing Efficiency of Clinical Trials by Incorporating Treatment Preferences

We propose a Preference-Randomized Controlled Trial (PRCT) framework that integrates outcomes from a randomized controlled trial (RCT) and a self-selected treatment cohort, enabling partial pooling of outcome data while preserving the internal validity of randomized comparisons. Our method introduces shrinkage weights based on empirical outcome similarity between cohorts, allowing data-adaptive borrowing without assuming conditional exchangeability or transportability. We evaluate the PRCT estimator through simulation studies under varying degrees of cross-cohort exchangeability, demonstrating improved efficiency over analyses of only the RCT cohort. Application to outcomes collected from an actual clinical trial illustrates the utility of our methods in real-world clinical settings where full randomization may not be feasible. Our approach provides a pragmatic design and analysis tool to accommodate patient preference without compromising statistical rigor.

Look to Michigan Biostatistics Student Research Showcase

March 20, 2026 | 2:30 - 4:00 p.m.

Student Presenters

Jessica Aldous

Multi-Resolution Spatial Regression Analysis of Cellular Colocalizations in Cancer Imaging

Xingran Chen

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation

Liz Davis

Heterogeneity-Adaptive Meta-Analysis

Stefan Eng

Causal Network Discovery using Mendelian Randomization

Sarah Ferlito

Bayesian Dynamic Borrowing Approaches for Incorporating Patient Treatment Preferences in SMART Designs

Rachel Gonzalez

Impact of morning versus afternoon infusions on overall survival in metastatic non-small-cell lung cancer

Ralph Jiang

Scalable Image-on-Scalar Regression for High-Dimensional Neuroimaging Data

Neo Kok

Impact of Missing Data and Monitoring Duration on Downstream Analyses in Continuous Glucose Monitoring

Tom Liu

Total Variance Regularization for Spatially Coherent Cell-Type Deconvolution in Spatial Transcriptomics

Ziyu Liu

Bayesian Image-on-Image Regression for Linking Resting-State Connectivity to Task fMRI Activation

Longhao Pang

Hierarchical Bayesian Framework for Scalar-on-Network Regression with Random Effects

Leyuan Qian

Smooth tensor decomposition for ambulatory blood pressure monitoring data

Edward Shao

Fast distance computation of multivariate distributions via nonparanormal transport

Hanna Venera

Zero inflated Outcomes in SMART Designs

Yanruyu Zhu

Bayesian Data Augmentation for Increasing Efficiency of Clinical Trials by Incorporating Treatment Preferences

Information For

About Us

Student Resources

Connect