Many members of the Michigan Biostatistics community -- including students, faculty, postdocs, and alumni -- will be presenting their research throughout the duration of the Joint Statistical Meetings in Nashville, Tennessee, from August 3 - August 7. If you would like to learn more about what your fellow community members are working on, we've organized information about their presentations onto this one convenient table. We hope you will support one another and check out as many of the talks, panels and poster sessions as you're able. Use the filters below to explore presentations by date or keyword. All times listed are in Central.
MONDAY, AUGUST 4 | Omni Nashville Hotel | Cumberlands Ballroom 5 (Third Floor)
Join the University of Michigan Department of Biostatistics for a special community gathering during JSM 2025 in Nashville. Reconnect with alumni, faculty, students, and friends as we celebrate our field, spark new collaborations, and honor the inaugural recipient of the Michigan Biostatistics “Significant Alumni” award. Don’t miss this opportunity for meaningful conversation, celebration, and community — we hope to see you there!
| Speaker | Affiliation | Date | Session Start | Presentation Time | Location | Session | Title | Abstract | Keywords | Additional Authors | Link |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Nico Kaciroti | Joint Faculty | Sun, Aug 3 | 2:00 PM | 2:05 PM | CC-106B | Bayesian Analysis of Massive and Complex Data Sets | A Bayesian approach to correct for misclassification error in EHR data | View AbstractHealth equity in pediatric care is an important mission for medical institutions in the US. Increasing research is done to identify inequities in health care outcomes using Electronic Health Records (EHR). However, EHR on pediatric patients often have inaccurate records of patient race (doi:10.1001/jamanetworkopen.2024.31073). Ignoring the misattribution in racial designations in EHR, studies run the risk of bias inferences. Further, accuracy of racial designations is important to clinical care improvement efforts and health outcomes. We propose an empirical Bayesian model to correct for misclassification in racial designation or in EHR. The model uses a survey sample (n=1,594) to estimate the misclassification error between the recorded race in EHR and the self-identified race. The sample is used to derive an empirical prior distribution for the misclassification error, which can be used in future studies using EHR data to derive posterior distributions corrected for race misclassification error. The race corrected posterior distribution is used to derive inferences. The proposed approach is applied to a pediatric study using EHR data from CS Mott Children's Hospital. |
Measurement error;Missing data;Sensitive analysis | Link | |
| Leyuan Qian | Student | Sun, Aug 3 | 2:00 PM | 3:15 PM | CC-104A | SPEED 1: Data Challenge and Prediction Modelling, Part 1 | Smooth Tensor Decomposition for Ambulatory Blood Pressure Monitoring Data | View AbstractAmbulatory blood pressure monitoring (ABPM) is widely used to track blood pressure and heart rate over periods of 24 hours or more. Most existing studies rely on basic summary statistics of ABPM data, such as means or medians, which obscure temporal features like nocturnal dipping and individual chronotypes. To better characterize the temporal features of ABPM data, we propose a novel smooth tensor decomposition method. Built upon traditional low-rank tensor factorization techniques, our method incorporates a smoothing penalty to handle noise and employs an iterative algorithm to impute missing data. We also develop an automatic approach for the selection of optimal smoothing parameters and ranks. We apply our method to ABPM data from patients with concurrent obstructive sleep apnea and type II diabetes. Our method explains temporal components of data variation and outperforms the traditional approach of using summary statistics in capturing the associations between covariates and ABPM measurements. Notably, it distinguishes covariates that influence the overall levels of blood pressure and heart rate from those that affect the contrast between the two. |
Low-rank tensor factorization;Smoothing penalty;Missing data imputation | Irina Gaynanova | Link |
| Jian Kang | Faculty | Sun, Aug 3 | 2:00 PM | 3:20 PM | CC-208B | Bayesian Methods for Structured, Heterogeneous and High-Dimensional Neuroimaging Data | Bayesian Scalar-on-Image Regression with the Spatially Varying Neural Network Prior | View AbstractDeep neural networks (DNN) have been adopted in the scalar-on-image regression which predicts the outcome variable using image predictors. However, training DNN often requires a large sample size to achieve a good prediction accuracy and the model fitting results can be difficult to interpret. In this work, we propose a noval Bayesian non-linear scalar-on-image regression framework with a spatially varying neural network (SV-NN) prior. The SV-NN is constructed using a single hidden layer neural network with its weights generated by the soft-thresholded Gaussian process. Our framework is able to select interpretable image regions and to achieve high prediction accuracy with limited training samples. The SV-NN provides large prior support for the imaging effect function, enabling efficient posterior inference on image region selection and automatically determining the network structures. We establish the posterior consistency of model parameters and selection consistency of image regions when the number of voxels/pixels grows much faster than the sample size. We develop an efficient posterior computation algorithm based on stochastic gradient Langevin dynamics (SGLD). We compared our methods with state-of-the-art deep learning methods via analyses of multiple real data sets including the task fMRI data in the Adolescent Brain Cognitive Development (ABCD) study. |
Scalar-on-Image Regression | Link | |
| Ying Ding | Alumni | Sun, Aug 3 | 2:00 PM | 3:25 PM | CC-207D | Innovative Statistical and Computational Approaches for Multi-modal Data in Ophthalmology Research | Interpretable Heterogeneous Treatment Effect Estimation and Causal Subgroup Discovery in Survival Outcomes, with Application to Age-related Macular Degeneration Studies | View AbstractEstimating heterogeneous treatment effect (HTE) for survival outcomes has gained increasing attention, as it captures the variation in treatment efficacy across patients or subgroups in improving survival or delaying disease progression. However, most existing methods focus on post hoc subgroup identification rather than simultaneously estimating HTE and selecting causal subgroups. In this paper, we propose an interpretable HTE estimation framework that uses meta-learners with the conditional inference tree to estimate CATE for survival outcomes and identify predictive subgroups simultaneously. We evaluated the performance of our method through comprehensive simulation studies in various randomized clinical trial (RCT) settings. Furthermore, we demonstrate its application in a large RCT for age-related macular degeneration (AMD), a progressive polygenic eye disease, to estimate the HTE of an antioxidant and mineral supplement on time-to-AMD progression and to identify genetics-based subgroups with enhanced treatment effects. Our method offers a direct interpretation of the estimated HTE and provides evidence to guide precision medicine and healthcare. |
interpretable heterogeneous treatment effect;precision medicine;randomized clinical trials;subgroup identification;age-related eye disease studies (AREDS) | Link | |
| Yi Li | Faculty | Sun, Aug 3 | 4:00 PM | 4:05 PM | CC-201A | Recent Advances in Statistical Methods for Integrative Analyses of Healthcare Data | Causal Meta-Analysis by Integrating Multiple Observational Studies | View AbstractIntegrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population. An R package has been developed for disseminating the methods. |
causal inference;integrative analysis;observational studies;data integration | Link | |
| Yueying Hu | Student | Sun, Aug 3 | 4:00 PM | 4:20 PM | CC-105B | Quantifying Bias, Inequality, and Violence: Advances in Statistics and Justice | Accounting for Measurement Instability in Census-Tract Level Mortgage Discrimination via Joint Modeling | View AbstractRacial disparities in cognitive health reflect entrenched structural inequalities. The Mortgage Density Index Ratio (MDIR) quantifies census-tract level housing and lending discrimination, but it may be unstable in hypersegregated areas. To address this, we developed a joint modeling approach that simultaneously estimates cognitive outcomes and latent mortgage rates for Black and White households. In simulations, joint modeling showed notably lower bias and greater robustness in small- to moderate- sized census tracts compared to traditional regression approaches. Applying joint modeling to six cognitive domains in Michigan Cognitive Aging Project (MCAP) data (N = 644), we identified a significant association between MDIR and processing speed only among Black participants, with a one-unit MDIR increase (i.e., greater racial parity in mortgage lending) corresponding to a 0.48 SD improvement in processing speed (95% CI: 0.05-0.93). Traditional regression failed to detect this effect. These findings underscore the importance of advanced statistical methods in quantifying structural racism and highlight the disproportionate effects of mortgage discrimination in Black adults. |
Measurement error;Joint modeling;Hypersegregation;Health disparities | Michael Elliott | Link |
| Rebecca Andridge | Alumni | Sun, Aug 3 | 4:00 PM | 4:25 PM | CC-104E | Innovative Analytic Strategies for Navigating Nonprobability Samples | Sensitivity Analyses for Nonignorable Selection Bias When Estimating Subgroup Parameters in Nonprobability Samples | View AbstractSelection bias in survey estimates is a major concern, for both non-probability samples and probability samples with low response rates. The proxy-pattern mixture model (PPMM) has been proposed as a method for conducting a sensitivity analysis that allows selection to depend on survey outcomes of interest, i.e., assuming a nonignorable selection mechanism. Indices based on the PPMM have been proposed and used to quantify the potential for non-ignorable nonresponse or selection bias, including the SMUB for means and the MUBP for proportions. These methods require information from a reference data source, such as a large probability-based survey, with summary-level auxiliary information for the target population of interest (means, variances, and covariances of the auxiliary variables). To this point, the SMUB/MUBP measures have exclusively been used to estimate bias in overall population-level estimates. Extension to domain-level estimates is straightforward if the reference data source contains the domain indicator so that population-level margins within the domain of interest can be calculated. However, interest may often lie in subgroups for which population-level summaries are not available. This will happen in cases where the domain indicator is observed on the survey only (not in the reference data source) and can also happen when the goal is estimation within intersectional subgroups for which stable/reliable population-level estimates of auxiliary variables may not be available. To combat this issue, we propose creating nonignorable selection weights based on the PPMM and using these weights for domain estimation and subsequent calculation of the SMUB/MUBP within subgroups. These PPMM selection weights rely on a single sensitivity parameter that ranges from 0 to 1 and captures a range of selection mechanisms, from ignorable to an "extreme" non-ignorable mechanism where selection depends only on the outcome of interest. The PPMM selection weights are based on the re-expression of the PPMM as a selection model, using the known equivalence between pattern-mixture models and selection models. In this talk, we briefly describe the re-expression of the PPMM as a selection model and illustrate the use of the novel non-ignorable selection weights to estimate various subgroup quantities using the Census Household Pulse Survey under a range of assumptions on the selection mechanism. |
proxy pattern-mixture model;domain estimation | Brady West (Joint) [Discussant] | Link |
| Erin Craig | Faculty (Incoming) | Sun, Aug 3 | 4:00 PM | 4:45 PM | CC-103C | Harnessing the power of large-scale and heterogeneous data with integrative analysis | Pretraining and the Lasso | View AbstractPretraining is a popular and powerful paradigm in machine learning to pass information from one dataset to another. For example, suppose we have a modest-sized dataset of images of cats and dogs, and we plan to fit a neural network to classify them. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. We then fix all network weights except the top layer(s), which perform the final classification, and fine-tune those on our dataset. This often results in dramatically better performance than the network trained solely on our smaller dataset. In this talk, I will present a framework for pretraining the lasso, which allows us to enjoy the performance benefits of pretraining while retaining the interpretability and simplicity of sparse linear modeling. Suppose for example we wish to predict cancer survival time using a dataset that spans multiple cancer types. With lasso pretraining, we start by fitting a lasso model using the entire dataset, then we use this to guide the fitting of a specific model for each cancer type. Importantly, we have a hyperparameter which determines the influence of the overall model on the specific models. This process also reveals which features are predictive for most or all classes, and which are predictive for one or just a few. This latter set will often be of most interest to the scientist. Lasso pretraining is a general framework with a wide variety of applications, including stratified models, multi-response models and conditional average treatment estimation, and I will demonstrate its use with real-world biomedical examples. |
Link | ||
| Tsung-Hung Yao | Alumni | Sun, Aug 3 | 4:00 PM | 5:35 PM | CC-205C | Latest Research in Genomics and Microbiome with a Hint of Bayesian | Robust Bayesian Graphical Regression Models for Assessing Tumor Heterogeneity in Proteomic Networks | View AbstractGraphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of two canonical assumptions: (1) a homogeneous graph with a common network for all subjects or (2) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail in certain applications such as proteomic networks. We propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality by random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity via graphical regressions. We formulate a new characterization of dependencies, conditional sign independence with covariates, with an efficient sampler. Simulation studies show that rBGR outperforms existing graphical models for data from various levels of non-normality in both edge and covariate selection. We use rBGR to access proteomic networks and find protein-protein interactions that are differentially associated with immune cell abundance. |
Bayesian graphical models;Cancer;Conditional sign independence;Covariate-dependent graphs;Protein-protein interactions | Link | |
| Irina Gaynanova | Faculty | Mon, Aug 4 | 8:30 AM | 8:30 AM | CC-104C | AI & Statistical Approaches to Data Fusion | Panel Discussion: AI & Statistical Approaches to Data Fusion | View AbstractInformation about complex systems of interest – such as weather or climate, personal health, autonomous vehicles, robots, health service systems, even the human body – can be obtained from different types of sensors and instruments, as well as measurement methods and models. There is a need to utilize information about the same object or phenomenon from myriad datasets and modeled information, but the increasing heterogeneity of data types that provide information about the same system – e.g., text, video, audio, images – adds to the challenge in integrating information for prediction and inference. In many cases the data also needs to be combined with scientific models that provide information about the structure of the system. "Multimodal data fusion", or the fusion of heterogeneous sources of data and modeled information for a unified and global view of the system from multiple modalities is thus important, but also challenging. While there is no single definition of "data fusion" ("data integration"), the underlying idea is to develop estimates and principled measures of uncertainty based on multiple sources of data and modeled information. Multimodal data are a collection of information from diverse sensors, surveys, and measuring systems that capture complementary views of entities and events under study. This panel will discuss some of these opportunities and challenges, with a focus on the role of statistics and statistical thinking in these contexts. The speakers represent a range of areas of expertise and application areas, including public health, environmental applications, and earth systems. The discussion will touch on existing methodological approaches that have been widely used for multimodal data (and model) fusion, including Hidden Markov Models, Bayesian Belief Networks, Similarity Network Fusion, heterogeneous ensembles, to name just a few. The increasing size and heterogeneity of data has sparked interest in leveraging artificial intelligence methods, including deep learning-based approaches such as Deep Belief Net-based or Stacked Autoencoder-based multimodal data fusion techniques, among others. The panel will explore the range of barriers across domains, as well as potential solutions to these problems, and highlight statistical questions that arise within a landscape that is continuing to evolve, including some attention to the pros and cons of different approaches to multimodal data (e.g., model-based as compared to deep-learning based). Panelists will probe the assumptions made under different methods and highlight contexts where these techniques might be appropriate. The panel will discuss state-of-the-art and best practices, drawing from their deep expertise. This conversation is particularly timely, given the heterogeneity and size of today's data, as well as the demand for combining information across fields ranging from climate to cancer to national defense. |
Link | ||
| Fan Bu | Faculty | Mon, Aug 4 | 8:30 AM | 9:15 AM | CC-201A | Innovative Statistical Approaches for Public Health and Global Epidemiology | Inferring HIV Transmission Patterns from Viral Deep-Sequence Data via Latent Spatial Poisson Processes | View AbstractViral deep-sequencing technologies play a crucial role toward understanding disease transmission patterns, because the higher resolution of these data provide evidence on transmission direction. To better utilize these data and account for uncertainty in phylogenetic analysis, we propose a spatial Poisson process model to uncover HIV transmission flow patterns at the population level. We represent pairings of two individuals with viral sequence data as typed points, with coordinates representing covariates such as sex and age, and the point type representing the unobserved transmission statuses (linkage and direction). Points are associated with deep-sequence phylogenetic analysis summary scores that reflect the strength of evidence for each transmission status. Our method jointly infers the latent transmission status for all pairings and the transmission flow surface on the source-recipient covariate space. In contrast to existing methods, our framework does not require pre-classification of the transmission statuses of data points, instead learning them probabilistically through fully Bayesian inference. By directly modeling continuous spatial processes with smooth densities, our method enjoys significant computational advantages over previous methods that discretize the covariate space. In a HIV transmission study from Rakai, Uganda, we demonstrate that our framework can capture age structures in HIV transmission at high resolution and bring valuable insights. (This is joint work with Kate Grabowski, Joseph Kagaayi, Oliver Ratmann, and Jason Xu.) |
Latent Spatial Poisson Processes | Link | |
| Rachel Gonzalez | Student | Mon, Aug 4 | 10:30 AM | 10:30 AM | CC-Hall B | Contributed Poster Presentations: Biometrics Section | Flexible Individualized Treatment Strategies in Micro Randomized Trials with Binary Rewards | View AbstractMicro-randomized trials (MRTs) are often used in mHealth studies to assess app-based interventions. Participants are randomized to receive treatment at a series of decision points, traditionally using the same rule across individuals. Several recent MRTs utilize Thompson Sampling (TS), a reinforcement learning algorithm, to build individualized treatment strategies that optimize delivery with respect to a reward. Treatment may interact with several contextual features, but estimation of models in this setting can be unreliable. This is especially difficult with a binary reward where complete separation often occurs, even with a large sample and few features. We present an approach to balance algorithmic flexibility and computational cost in the context of a binary reward that (1) uses partial pooling and weakly informative priors that apply more shrinkage to higher-order interactions and (2) considers the amount of information available in the data when defining a model. Our approach is useful in MRTs where the TS algorithm must be automated. We demonstrate the empirical utility of our method in a digital twin of an ongoing MRT study, LowSalt4Life, compared to logical alternatives. |
Mobile health;Micro-randomized trials;Clinical trials;Reinforcement learning;Individualized treatment | Walter Dempsey | Link |
| Yajuan Si | Joint Faculty | Mon, Aug 4 | 10:30 AM | 11:35 AM | CC-208A | Doing More with Less: Recent Advances in Small Area Estimation | A Bayesian framework of combining multiple data sources for small area estimation | View AbstractSmall area estimation often relies on model-based approaches to stabilize estimates of subgroups with small sample sizes. The model-based approaches can be hierarchical models or introduce prior distributions in a Bayesian paradigm to borrow information across subgroups. Rich literature work has made important contributions to SAE methods, especially with applications to complex sample surveys. However, due to recent data collection challenges, survey data alone cannot meet analytic demands. Combining multiple data sources has become a research priority. SAE methods need to account for data collection tailored to each data source and integrate all relevant information to improve inference. We consider a few scenarios, where multiple data sources collect different measure components and participant groups, and develop a Bayesian SAE framework. We will compare with alternatives and use simulation and application studies to illustrate the improvement. |
Small area estimation;Data integration;Bayesian models | Link | |
| Kalins Banerjee | Postdoc | Mon, Aug 4 | 10:30 AM | 11:50 AM | CC-209B | Latest Genomics, Microbiome, and Sequencing Research | Robust Inference of Copy Number Variations in Spatial Transcriptomics | View AbstractIntratumor heterogeneity (ITH), a hallmark of cancer, is characterized by genetically distinct clusters of cells, or clones, that are spatially organized within a tumor. Copy-number variation (CNV), one of the key drivers of ITH, affects genomic segments by altering the underlying number of chromosomes. Spatial transcriptomics (ST), measuring RNA expression simultaneously from thousands of tissue-locations, offers a unique opportunity to identify the CNV architecture and spatial organization of the cancer-clones. We introduce a robust framework, integrating gene expression, spatial coordinates, and SNPs from ST samples, to identify segments with somatic CNVs and their allele-specific copy-number profiles. Our framework employs a Gaussian mixture model to capture spatially correlated expression patterns and a mixture of Binomial distributions to model the allele counts. Using datasets across multiple ST platforms, we first assessed the quality and signal-to-noise ratio in the SNPs to ensure reliable allele-specific inference. We then demonstrated that the proposed model had superior yet robust performance in discovering CNVs from the malignant region of ST tumor samples. |
Copy-number variations;Spatial transcriptomics in cancer biology;Intratumor heterogeneity;Multimodal data integration | Link | |
| Michael Boehnke | Faculty | Mon, Aug 4 | 2:00 PM | 2:00 PM | CC-214 | Statisticians Leading the Way: Leveraging the All of Us Research Program Data to Advance Science | Panel Discussion: Statisticians Leading the Way: Leveraging the All of Us Research Program Data to Advance Science | View AbstractLarge-scale biobank initiatives are becoming a cornerstone of scientific research worldwide. Initiatives like the UK Biobank have demonstrated the transformative potential of this global trend, driving scientific discoveries and resulting in high-impact publications. Building on this momentum, the U.S. NIH's All of Us Research Program is a historic effort to advance precision medicine by enrolling over one million participants nationwide. As one of the largest and most diverse biobanks of its kind, All of Us provides a rich dataset that integrates multiple data domains, including surveys, electronic health records (EHR), genomics, and Fitbit data. For statisticians, the richness of the All of Us data provides unique opportunities for statistical research, including time-dependent analysis, causal inference, statistical machine learning, and integrative analysis of large genetic, epidemiological, and EHR data. Therefore, as All of Us continues to expand and evolve, the statistical community should be ready to seize these forthcoming opportunities. Ultimately, learning how to leverage this resource will position statisticians to not only advance statistical methodology, but also take the lead in driving scientific breakthroughs. This session will provide an overview of the All of Us data and discuss its analytic opportunities and challenges. It aims to engage statisticians in exploring how this rich dataset can be leveraged to drive innovative statistical methods and applications that advance scientific discovery. In addition, this session will provide insight on how biobanks are shaping the future of statistics and data science. |
Link | ||
| Bhramar Mukherjee | Adjunct Faculty | Mon, Aug 4 | 2:00 PM | 2:00 PM | CC-214 | Statisticians Leading the Way: Leveraging the All of Us Research Program Data to Advance Science | Panel Discussion: Statisticians Leading the Way: Leveraging the All of Us Research Program Data to Advance Science | View AbstractLarge-scale biobank initiatives are becoming a cornerstone of scientific research worldwide. Initiatives like the UK Biobank have demonstrated the transformative potential of this global trend, driving scientific discoveries and resulting in high-impact publications. Building on this momentum, the U.S. NIH's All of Us Research Program is a historic effort to advance precision medicine by enrolling over one million participants nationwide. As one of the largest and most diverse biobanks of its kind, All of Us provides a rich dataset that integrates multiple data domains, including surveys, electronic health records (EHR), genomics, and Fitbit data. For statisticians, the richness of the All of Us data provides unique opportunities for statistical research, including time-dependent analysis, causal inference, statistical machine learning, and integrative analysis of large genetic, epidemiological, and EHR data. Therefore, as All of Us continues to expand and evolve, the statistical community should be ready to seize these forthcoming opportunities. Ultimately, learning how to leverage this resource will position statisticians to not only advance statistical methodology, but also take the lead in driving scientific breakthroughs. This session will provide an overview of the All of Us data and discuss its analytic opportunities and challenges. It aims to engage statisticians in exploring how this rich dataset can be leveraged to drive innovative statistical methods and applications that advance scientific discovery. In addition, this session will provide insight on how biobanks are shaping the future of statistics and data science. |
Link | ||
| Peter Song | Faculty | Mon, Aug 4 | 2:00 PM | 2:55 PM | CC-207A | Recent Advancement of Statistical Methods for Environmental Health Studies | Learning High-Dimensional Mechanistic Pathways of Exposome to Health Outcomes using Mixed Integer Optimization Algorithms | View AbstractThis talk will focus on a new approach to studying high-dimensional mechanistic pathways of exposome to health outcomes in the framework of homogeneity pursuit (HP). HP allows scientists to cluster similar toxicants into mixtures while accommodating high-dimensional mediators (e.g. metabolites) that play different roles in mediating the relationships between mixtures and health outcomes. Statistical learning is built upon integer optimization algorithms that formulate the task on clustering of toxicants into an estimation problem. Moreover, we propose an ensemble inference that can provide confidence intervals for high-dimensional direct and indirect effects. This new statistical toolbox will be illustrated by simulation studies and real-world data examples. |
Exposome;Directed acyclic graph;Constrained optimization | Leyao Zhang | Link |
| Ying Yuan | Alumni | Tue, Aug 5 | 8:30 AM | 8:30 AM | CC-Dean Grand Ballroom A1 | IOL: Statistical and Design Considerations for Dose Optimization | Panel Discussion: Statistical and Design Considerations for Dose Optimization | View AbstractTBA |
Link | ||
| Jeffrey Gonzalez | Alumni | Tue, Aug 5 | 8:30 AM | 8:30 AM | CC-207D | Survey Nonresponse: Integrating Perspectives on Design, Analysis and Adjustment | Discussant: Survey Nonresponse: Integrating Perspectives on Design, Analysis and Adjustment | View AbstractTBA |
Link | ||
| Michael Elliott | Faculty | Tue, Aug 5 | 8:30 AM | 8:35 AM | CC-207D | Survey Nonresponse: Integrating Perspectives on Design, Analysis and Adjustment | Sampling Low-Incidence Populations Under Anticipated Nonresponse | View AbstractSurvey sampling theory on optimal allocation typically assumes 100% response rates. This has led sample designers to resort to ad hoc practices for accommodating anticipated nonresponse, such as computing classic allocations under complete response and then adjusting for anticipated sample loss. In a previous paper (2024), we showed that standard practices may perform quite poorly in some situations. For instance, in an application with a large degree of differential nonresponse, our proposed allocation increased the effective sample size by 25% relative to standard practices. Here, we extend our previous paper, which assumed that all members of the frame are eligible population members, to situations where eligibility is not known upfront. For instance, it can be challenging to survey low-incidence populations, where population membership is not known in the frame, although auxiliary data are often available for constructing strata with different concentrations (eligibility rates) of the target population. We provide new theory on optimal allocation for low-incidence populations under anticipated nonresponse. We treat eligibility through an analogy to domain estimation, but in contrast with previous theory on sampling for rare populations, nonresponse is included in our formulation. We provide theoretical results and will compare our allocation with existing approaches through an application. |
Sampling;Sample design;Sample allocation;Nonresponse;Rare populations | Link | |
| Brady West | Joint Faculty | Tue, Aug 5 | 8:30 AM | 9:25 AM | CC-207D | Survey Nonresponse: Integrating Perspectives on Design, Analysis and Adjustment | Using New Measures of Selection Bias to Develop General Adjustments for Survey Nonresponse | View AbstractRapidly declining response rates in surveys across the world, regardless of the mode of data collection used, have forced survey statisticians and methodologists to consider alternative measures of the quality of survey estimates that allow for the possibility of non-ignorable selection mechanisms. This talk will introduce a number of recently-developed measures of selection bias in common survey estimates that provide survey statisticians with general methods of adjusting for the selection mechanisms associated with survey data sets, whether it arises from sampling or nonresponse, and whether the selection is ignorable or non-ignorable. These new measures are entirely model-based and enable users to perform sensitivity analyses, examining the potential bias in estimates introduced by more complex sampling and nonresponse mechanisms. Important considerations regarding the necessary auxiliary data sources for using the measures and available software implementing calculation of the measures (and corresponding adjustments to survey estimates) will be discussed as well. |
Link | ||
| Rui Nie | Student | Tue, Aug 5 | 10:30 AM | 10:30 AM | CC-Hall B | Contributed Poster Presentations: Section on Nonparametric Statistics | Profiling Functional Effects of Long-Term Physical Activity on Risk of Diabetes Onset with All-of-US | View AbstractDiabetes is a leading chronic condition that affects the regulatory glucose mechanism. Preventive care, such as physical activity, is essential to reduce the risk of diabetes onset. The All-of-US Research Program, launched by the NIH, records the daily active zone minutes of over 15620 diverse participants across time. We conducted a retrospective study on All-of-US participants with data collected before the outbreak of COVID-19 in March 2020 when physical activity patterns began to shift. This project assessed the functional association of long-term physical activity on the risk of diabetes onset, using the logistic regression with time-varying effects of daily activity durations. Individuals' long-term activity duration curves and effect curves are decomposed by shared orthonormal basis functions. We adopt fused lasso to cluster individuals based on their latent projection features. Participants in the same subgroup share characteristic activity duration curves and functional effects of long-term physical activity. The subgroup functional effects are estimated through the alternating direction methods of multiplier (ADMM). The details of the data analysis results are presented. |
Functional effects;Subgroup analysis;Time-varying effects;All-of-US research program;Fitbit | Peter Song | Link |
| Irina Degtiar | Alumni | Tue, Aug 5 | 10:30 AM | 10:30 AM | CC-Davidson Ballroom A3 | Beyond Academia: Exploring Statistics Careers in Industry, Government, and More | Panel Discussion: Beyond Academia: Exploring Statistics Careers in Industry, Government, and More | View AbstractIn today's rapidly evolving data-driven world, the skills of statisticians extend far beyond traditional academic research. From influencing policy decisions in government agencies to driving innovation at technology companies, the range of opportunities for statisticians is expanding quickly. Despite the abundance of opportunities, their variety and scope are not always well communicated to graduate students and early-career statisticians, who often have limited exposure beyond academic settings at this point in their careers. As an advocacy group for early-career statisticians, our panel session is designed to illuminate the diverse career opportunities available in various sectors beyond traditional academic pathways. We hope this session can motivate early career statisticians to explore a wide array of career options aligned with their interests and understand how to navigate these varied roles. By showcasing insights from professionals across different industries, we aim to bridge the gap between academic training and the dynamic, multifaceted career landscape available outside of academia. |
Link | ||
| Yiyuan Huang | Student | Tue, Aug 5 | 10:30 AM | 10:35 AM | CC-101C | Challenges in survival analysis with outcome-dependent sampling design, missingness, and competing events. | A Model-Assisted Test of the Treatment Effect on Recurrent Events in the Presence of Terminal Events | View AbstractTesting the treatment effect on recurrent event when terminal event exist has been challenging in clinical studies. Traditional methods on cumulative frequency unfairly disadvantage longer survivors as they tend to experience more recurrent events. The methods like the While-Alive loss rate ratio test (WA) tried to resolve this issue, and it performs well regarding the type I error and power when recurrent event rate holds constant over time. However, if the constant-rate assumption is violated, WA can exhibit inflated type I error and inaccurate effect size estimation. To overcome this pitfall, we propose a Proportional Marginal Rate Structural Model assisted test (PMRSMT), in the framework of separable treatment effect for recurrent and terminal events, respectively. In the simulation study, we show that PMRSMT has controlled type I error and comparable power as WA, even when the recurrent event rate varies over time. We further illustrate the application of PMRSMT to compare postoperative adverse events under interventions with different mechanical circulatory support devices in the Interagency Registry of Mechanically Assisted Circulatory Support program. |
recurrent event;competing risk;hypothesis testing;structural model | Peter Song;Min Zhang (Adjunct) | Link |
| Kimberly Hochstedler Webb | Alumni | Tue, Aug 5 | 10:30 AM | 11:35 AM | CC-105B | Innovative Statistical Approaches to Mitigate Data Challenges in Women's and Maternal Health | The misdiagnosed mediator: Estimating the effect of maternal age on preterm birth risk in the presence of misclassified gestational hypertension | View AbstractThe risk of preterm birth increases with maternal age, and it is possible that hypertensive disorders, like gestational hypertension, mediate this maternal age-preterm birth relationship. Previous studies, however, have found low diagnostic accuracy of gestational hypertension. Thus, any mediation analysis conducted with this potentially misclassified binary mediator variable may be severely biased. This bias is especially challenging to address when the misclassification is covariate-dependent and when no gold standard measures are available. In this study, we develop methods to handle misclassification in the gestational hypertension mediator variable by modelling misdiagnosis based on patient-level factors. We present an expectation-maximization algorithm to estimate the model and provide an R package to implement the proposed methods. Using these methods, we assess the misclassification-corrected effect of maternal age on preterm birth risk, while simultaneously estimating misclassification rates of gestational hypertension. |
mediation analysis;bias-correction;label switching;EM algorithm;predictive value weighting;causal effects | Link | |
| Mukai Wang | Student | Tue, Aug 5 | 2:00 PM | 2:00 PM | CC-Hall B | Contributed Poster Presentations: Section on Statistics in Genomics and Genetics | Nonparametric Denoising of Microbiome Metagenomics Data | View AbstractWe propose a nonparametric method to denoise microbiome metagenomics sequencing count matrices. The goal of denoising is to recover the non-zero expected abundances of rare taxa and reduce the variance of prevalent taxa. The count matrices are dichotomized into a series of binary matrices given a sequence of thresholds. We estimate the probability of each count matrix entry being larger than each threshold by taking products of conditional probabilities. We develop a novel matrix factorization algorithm for the low-rank representation of conditional probabilities. We calculate the denoised count based on the empirical distribution formed by the estimated probabilities. Simulations show that our method is better than parametric competitors at recovering accurate microbiome compositions. Our denoising method can improve downstream analyses such as training prediction models and microbiome network analysis. |
Microbiome metagenomics;Denoise;Binarization;Matrix factorization;Nonparametric | Gen Li | Link |
| Matthew Schipper | Joint Faculty | Tue, Aug 5 | 2:00 PM | 2:00 PM | CC-209A | Navigating Estimands and Missing Data: Emerging Trends in Clinical Development | Quantifying treatment benefit for individuals: using means and accounting for competing risks | View AbstractBackground: Personalized estimates of treatment benefit allow for informed decision making. Methods: Using the addition of ADT to RT in prostate cancer as an example, we calculate and compare measures of treatment benefit using differences in probability vs differences in mean times. We utilize a novel data integration approach to calculate the absolute risk of PCSM (cancer mortality) and MFS (mets free survival) within 15 years using patient level covariates for both cancer outcomes and competing mortality risk. We calculate Mean MFS times unrestricted and restricted to 15 years. We calculated each of these measures for individual patients in a contemporary cohort of >1000 patients enrolled in a statewide quality consortium. Results: The 15-year risk of PCSM for a stage IIC patient treated with RT+ADT varies from 7% to 15% at the 10th vs 90th percentile of competing mortality risk. For men in the same UIR risk group, ADT reduced the risk of mets at 10 years by an average of 4%, but the 10th and 90th percentiles were 1% and 14%, respectively. Conclusions: Accounting for individual competing risk levels is important when estimating treatment benefit. |
Data Integration;Survival Analysis;Competing risks;Treatment efficacy;Personalized medicine | Jessica Aldous;Ralph Jiang;Elizabeth Chase (Alumni) | Link |
| Michael Elliott | Faculty | Tue, Aug 5 | 2:00 PM | 2:00 PM | CC-205C | Sample Design and Non-response Modeling | Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Surve | View AbstractResponsive and adaptive designs have emerged as a framework for targeting and reallocating resources during the data collection period in order to improve survey data collection efficiency. Here, we report on the implementation and evaluation of a responsive design experiment in the National Survey of College Graduates that optimizes the cost-quality tradeoff by minimizing a function of data collection costs and the root mean squared error of a key survey measure, self-reported salary. At three points during the data collection process, we predict outcomes and costs for remaining non-respondents and combine with data from respondents to optimize effort on remaining cases with respect to cost and root mean squared error (RMSE) of mean self-reported salary This process allowed us to reduce data collection costs by nearly 10%, without a statistically or practically significant increase in the RMSE of mean salary or decrease in the unweighted response rate. This experiment demonstrates the potential for these types of designs to more effectively target data collection resources in order to reach survey quality goals. |
Responsive design;National Survey of College Graduates;Posterior predictive distribution | Link | |
| Hui Jiang | Faculty | Wed, Aug 6 | 8:30 AM | 8:30 AM | CC-201B | Innovative Statistical and Machine Learning Approaches for Omics and Healthcare Applications | Identification of cancer risk-associated variants and genes using asymmetric data integration | View AbstractCancer genomic research provides a significant opportunity to identify cancer risk-associated genes but often suffers from undesirably low statistical power due to limited sample sizes. Integrated analysis across different cancers has the potential to enhance statistical power for identifying pan-cancer risk genes. However, substantial heterogeneity among cancers makes this challenging. We developed a novel asymmetric integration method that addresses data heterogeneity and excludes uninformative datasets from the analysis. We applied this method to integrate genotype datasets with matched case and control individuals, using each cancer type as the primary dataset of interest and treating other cancers as auxiliary datasets. At the same FDR threshold, the integrated analysis identified more potential genetic variants and genes associated with cancer risk, highlighting the promise of this approach for integrating cancer datasets. |
asymmetric data integration;cancer risk-associated genetic variants and genes | Ruixuan Wang;Lam Tran;Ben Brennen;Larse Fritsche;Kevin He | Link |
| Soumik Purkayastha | Alumni | Wed, Aug 6 | 8:30 AM | 8:30 AM | CC-207A | Mental Health Statistics Section Contributed Session 2 | Examining Directional Association between Depression and Anxiety | View AbstractDepression and anxiety are debilitating and prevalent diagnoses with wide-reaching negative psychological and economic impacts. Clinicians note that depression and anxiety, although distinct conditions, often occur together in patients, with little information explaining such comorbidity. In absence of information on the underlying aetiology of these diseases, some clinicians hypothesize that one trait may predispose another, thereby inducing a direction of dependence between these psychological traits. The Intern Health Study (IHS) examines self-reported depression and anxiety among doctors in residency programs in the US. Being able to establish a sense of directionality between anxiety and depression to understand the dominance between these two mental health outcomes is critical to develop adequate clinical diagnostics and administer medical intervention. We propose a novel information-theoretic coefficient that leverages Shannon's entropy metric used to examine directed dependence between anxiety and depression. The proposed method is evaluated by simulation studies and applied to IHS data, where a dominating effect of depression on anxiety is observed in medical interns. |
bivariate causal discovery;information theory;mental health outcomes | Peter Song | Link |
| Qinmengge Li | Student | Wed, Aug 6 | 10:30 AM | 10:30 AM | CC-210 | Precision Medicine, Prediction, and Clinical Trial with Time-to-Event Outcomes | Relative Entropy-Based Discrete Relative Risk Models for Integrated Prediction of Competing Risk | View AbstractThe contemporary data landscape is enriched with an abundance of biobank data providing an unprecedented array of wealthier risk factors with more detailed competing risk outcomes fueling efforts to enhance prognostic predictions. However, the newly obtained data suffer from rare event rates, limited sample sizes, and high dimensionality. The presence of competing risks exacerbates these limitations, reducing the stability of estimation and prediction. To address these challenges, the incorporation of historical prediction models has been recognized as a promising strategy. However, prevailing integration methods often hinge on the strong assumptions of uniform survival outcome type, effect size and covariate space across disparate data sources - assumptions that frequently diverge from reality. In response, we propose a longitudinal multinomial relative entropy-based integration framework, which effectively incorporates summary-level data from established prediction models and account for patient privacy and data sharing constraints. We apply this innovative integration methodology to enhance the prediction of kidney transplant outcomes in patients during the COVID-19 pandemic. |
Competing risk;Survival analysis;Data integration;Kidney transplantation;COVID-19 | Kevin He | Link |
| Michele Peruzzi | Faculty | Wed, Aug 6 | 10:30 AM | 10:30 AM | CC-103C | Recent advances in interpretable model-based geostatistics for analyzing complex spatial data | Inside-out cross-covariance for spatial multivariate data | View AbstractAs the spatial features of multivariate data are increasingly central in researchers' applied problems, there is a growing demand for novel spatially-aware methods that are flexible, easily interpretable, and scalable to large data. We develop inside-out cross-covariance (IOX) models for multivariate spatial likelihood-based inference. IOX leads to valid cross-covariance matrix functions which we interpret as inducing spatial dependence on independent replicates of a correlated random vector. The resulting sample cross-covariance matrices are "inside-out" relative to the ubiquitous linear model of coregionalization (LMC). However, unlike LMCs, our methods offer direct marginal inference, easy prior elicitation of covariance parameters, the ability to model outcomes with unequal smoothness, and flexible dimension reduction. As a covariance model for a q-variate Gaussian process, IOX leads to scalable models for noisy vector data as well as flexible latent models. For large n cases, IOX complements Vecchia approximations and related process-based methods based on sparse graphical models. We demonstrate superior performance of IOX on synthetic datasets as well as on colorectal cancer proteomics data. |
Link | ||
| Christina Zhou | Alumni | Wed, Aug 6 | 10:30 AM | 11:05 AM | CC-210 | Precision Medicine, Prediction, and Clinical Trial with Time-to-Event Outcomes | Optimal individualized treatment regimes for survival data with competing risks | View AbstractPrecision medicine leverages patient heterogeneity to estimate individualized treatment regimes-formalized, data-driven approaches designed to match patients with optimal treatments. In the presence of competing events, where multiple causes of failure can occur and one cause precludes others, it is crucial to assess the risk of a main outcome of interest, such as one type of failure over another. This helps clinicians tailor interventions based on the factors driving that cause, leading to more precise treatment strategies. Currently, no precision medicine methods account for both survival and competing risk endpoints. To address this gap, we develop a nonparametric individualized treatment regime estimator. Our two-phase method accounts for overall survival from all events as well as the cumulative incidence of a main event. Additionally, we introduce a value function that jointly incorporates both outcomes. We develop random forests to construct individual survival and cumulative incidence curves. Simulation studies demonstrated that our proposed method performs well, which we applied to a cohort of peripheral artery disease patients at high risk for limb loss and mortality. |
precision medicine;random forests;survival analysis;cumulative incidence function | Link | |
| Bingkai Wang | Faculty | Wed, Aug 6 | 2:00 PM | 2:00 PM | CC-102A | From Guidance to Practice: Evolving Approaches to Covariate Adjustment in Clinical Trials | Asymptotic inference with flexible covariate adjustment under rerandomization and stratified rerandomization | View AbstractRerandomization is an effective treatment allocation procedure to control for baseline covariate imbalance. For estimating the average treatment effect, rerandomization has been previously shown to improve the precision of the unadjusted and the linearly-adjusted estimators over simple randomization without compromising consistency. However, it remains unclear whether such results apply more generally to the class of M-estimators, including the g-computation formula with generalized linear regression and doubly-robust methods, and more broadly, to efficient estimators with data-adaptive machine learners. In this paper, using a super-population framework, we develop the asymptotic theory for a more general class of covariate-adjusted estimators under rerandomization and its stratified extension. We prove that the asymptotic linearity and the influence function remain identical for any M-estimator under simple randomization and rerandomization, but rerandomization may lead to a non-Gaussian asymptotic distribution. We further explain, drawing examples from several common M-estimators, that asymptotic normality can be achieved if rerandomization variables are appropriately adjusted for in the final estimator. These results are extended to stratified rerandomization. Finally, we study the asymptotic theory for efficient estimators based on data-adaptive machine learners, and prove their efficiency optimality under rerandomization and stratified rerandomization. Our results are demonstrated via simulations and re-analyses of a cluster-randomized experiment that used stratified rerandomization. |
Link | ||
| Dylan Cable | Faculty | Wed, Aug 6 | 2:00 PM | 2:00 PM | CC-101C | Statistics in Modern Transcriptomics | Sparse low rank models for cellular perturbation experiments | View AbstractLarge scale cellular perturbation experiments, including those enabled by CRISPR-based technologies, allow for high throughput single-cell transcriptomics experiments to measure cellular responses to biological perturbations. We identify several statistical challenges of these datasets including a high proportion of null effects and correlated effects across similar genes. To address these issues, we develop a sparse, low-rank modeling approach for improved estimation of cellular perturbation effects. Testing on simulated and real data, we compare to existing deep learning methods and linear regression to demonstrate the value of our linear matrix modeling approach. We also explore whether our linear approach can outperform nonlinear methods for predicting combinatorial effects. |
Link | ||
| Leslie McClure | Alumni | Wed, Aug 6 | 2:00 PM | 2:00 PM | CC-104C | Data-Driven Impact: The Evolving Role of Statisticians and Data Scientists in Academic Medicine | Panel Discussion: Data-Driven Impact: The Evolving Role of Statisticians and Data Scientists in Academic Medicine | View AbstractAcademic statisticians and data scientists are team science focused and lead scientific innovation in learning health systems. As the role of and demand for data experts broadens, there is a need for both methodology focused researchers and those that are key leaders in collaborative projects. Collaborative statisticians and data scientists are critical to the success of academic medicine and must have expertise in traditional analytics and data management, as well as in innovative study design and inferential methods to answer important health care questions in a rapidly evolving evolving environment. Roles and expertise can include data coordination of large, multi-center trials and trial networks; integration in clinical care lines and/or clinical departments as an embedded scholar; electronic health record extraction, analysis, and integration; implementation science; and design methodology. Collaborative statisticians and data scientists are key communicators of scientific results and play a large role in the dissemination of research, subsequent design of future research, and in the training of the next generation of researchers. Further emphasizing their contributions, many collaborative academic statisticians and data scientists co-develop and lead research programs or centers, work within Clinical Translational Science Award Programs, and serve in departmental and/or institutional leadership roles. The purpose of this panel discussion is to describe the various roles and impact of collaborative statisticians and data scientists that are essential to furthering the science of healthcare. Advances in medicine cannot proceed without the critical and high-impact contributions of team scientists trained with expertise in analytics and study design. We will discuss the importance of structural support within institutions and how leaders can advocate for team science experts to be well-resourced and valued both in their respective organization and nationally, and to be set up for career success. |
Link | ||
| Emily Hector | Faculty (Incoming) | Wed, Aug 6 | 2:00 PM | 2:25 PM | CC-212 | Moving Towards Dynamic/Causal Process Models for Spatio-Temporal Extremes | A new mixture model for spatiotemporal extremes with flexible tail dependence | View AbstractWe propose a new model and estimation framework for spatiotemporal streamflow outcomes that flexibly captures asymptotic dependence and independence in the tail of the distribution. We model streamflow using a mixture of Gaussian and max-stable spatial and temporal random variables. A censoring mechanism allows us to leverage observations in the bulk to improve modeling of the tail. As the likelihood is intractable, we develop a deep Vecchia approximation to the likelihood using neural networks to fit a flexible quantile regression model with monotonic splines. Simulations and modeling of streamflow data from the U.S. Geological Survey illustrate the feasibility and practicality of our approach. |
Link | ||
| Lara Garmire | Joint Faculty | Wed, Aug 6 | 2:00 PM | 2:45 PM | CC-101C | Statistics in Modern Transcriptomics | BSNMani_ST: A Bayesian Model for Linking Spatial Transcriptomics Features to Patient Phenotypes at the Population Scale | View AbstractSpatial transcriptomics (ST) provides valuable insights into molecular and spatial features of tissues, but associating ST data with patient phenotypes at the population scale is challenging. We introduce BSNMani_ST, a Bayesian scalar-on-network regression model with manifold learning, designed to predict clinical outcomes by linking ST features to population phenotypes in a scalable and interpretable manner. We applied BSNMani_ST to spatial transcriptomics data from the Seattle Alzheimer's Disease Brain Cell Atlas, as well as a single-cell imaging mass spectrometry dataset of breast cancers. BSNMani_ST identified biologically relevant gene co-expression subnetworks. These subnetworks are enriched for neurogenesis, neuronal communication, and signaling pathways in the Brain Cell Atlas data and immune-related antigens, cytokeratin, and hormone receptor antigens in the breast cancer data. We also performed simulations using synthetic datasets with latent subnetworks, BSNMani_ST outperformed other competing methods. These results underscore its robustness in capturing population-level patterns while incorporating clinical context. |
Link | ||
| Emily Roberts | Alumni | Wed, Aug 6 | 2:00 PM | 3:05 PM | CC-104E | Engaging Students in Biostatistics & Data Science | Formation of a University’s Causal Inference Collaboratory | View AbstractA collaboratory is a creative group process designed to solve complex problems that brings the opportunity for new networks to form. This year the Institute for Public Health Practice and Research Policy funded our proposal to establish the Causal Inference Collaboratory at the U of Iowa. This initiative aims to foster collaboration and methodological advancements, positioning our budding group as a resource for researchers to advance causal inference research at our university. Our group has three primary aims: 1. To conduct a review of how causal theory and methods can provide innovative insights into public health research broadly and at our university. 2. Develop a program through workshops and collaborative projects with a Graduate Research Assistant. 3. Create a platform for collaboration and continuous learning through working groups. This component emphasizes collaboration on competitive grants related to causal inference research. This talk will showcase our successes and challenges of working toward these aims, highlighting the outcomes achieved through new collaborations and impactful research. Findings will include characterization of our ongoing research and teaching. |
causal inference | Link | |
| Xiang Zhou | Adjunct Faculty | Thu, Aug 7 | 8:30 AM | 8:35 AM | CC-207A | New statistical models for transcriptomics analysis with emerging technologies | TBA | View AbstractTBA |
Link | ||
| Junyoung Park | Postdoc | Thu, Aug 7 | 8:30 AM | 9:05 AM | CC-207B | From Compositional Microbiome Data to Longitudinal Biomarker Analysis: Cutting-Edge Statistical Methods | Beyond fixed thresholds: optimizing summaries of wearable device data | View AbstractWearable devices, such as actigraphy monitors and continuous glucose monitors (CGMs), capture high-frequency data, typically summarized by the percentage of time spent within fixed thresholds. For example, CGM data are categorized into hypoglycemia, normoglycemia, and hyperglycemia based on a standard glucose range of 70–180 mg/dL. Although scientific guidelines inform the choice of thresholds, it remains unclear whether this choice is optimal and whether the same thresholds should be applied across different populations. In this work, we define threshold optimality with loss functions that quantify discrepancies between the empirical distributions of wearable device measurements and threshold-based summaries. Using the Wasserstein distance as the base measure, we reformulate the loss minimization as optimal piecewise linearization of quantile functions, solved via stepwise algorithms and differential evolution. We also formulate semi-supervised approaches that incorporate some predefined thresholds based on scientific rationale. Applications to CGM data reveal that data-driven thresholds differ by population and improve discriminative power over fixed thresholds. |
Amalgamation;Continuous glucose monitoring (CGM);Histogram;Time-in-Range (TIR);Piecewise linearization;Wasserstein distance | Neo Kok;Irina Gaynanova | Link |
| Di Wang | Student | Thu, Aug 7 | 8:30 AM | 9:20 AM | CC-202C | Latest Techniques in Risk Prediction Modeling | Leveraging External Information from a Different Outcome Model with the Current Study | View AbstractLeveraging external information from related studies can improve prediction accuracy with insufficient data. However, conventional methods only consider incorporating information from the external data with the same outcome. In this paper, we develop an integration framework for the settings where the external and internal data are relevant but may be subject to different types of outcomes. The proposed framework utilizes the generic structure of certain models to bridge the different outcomes and introduces the statistics distance information to characterize the heterogeneity across different populations and outcomes. Illustrative examples discussed in this paper include the integration of continuous outcome data with binary outcome data and the integration of discrete survival outcome data with continuous survival outcome data. We evaluate the performance of the proposed method through comprehensive numerical simulations. We apply the proposed framework to multiple analyses of the acute kidney injury (AKI) study in populations who received immune checkpoint inhibitor (ICI) treatments. |
data integration;outcome heterogeneity;population heterogeneity | Kevin He | Link |
| Mingyan Yu | Student | Thu, Aug 7 | 8:30 AM | 9:50 AM | CC-207B | From Compositional Microbiome Data to Longitudinal Biomarker Analysis: Cutting-Edge Statistical Methods | Joint Modeling of Multiple Longitudinal Biomarkers and Survival Outcome via Threshold Regression | View AbstractLongitudinal biomarker data and health outcomes are routinely collected in many studies to assess how biomarker trajectories predict health outcomes. Existing methods primarily focus on mean biomarker profiles, treating variability as a nuisance. However, excess variability may indicate system dysregulations that may be associated with poor outcomes. In this paper, we address the long-standing problem of using variability information of multiple longitudinal biomarkers in time-to-event analyses by formulating and studying a Bayesian joint model. We first model multiple longitudinal biomarkers, some of which are subject to limit-of-detection censoring. We then model the survival times by incorporating random effects and variances from the longitudinal component as predictors through threshold regression that admits non-proportional hazards. We demonstrate the operating characteristics of the proposed joint model through simulations and apply it to data from the Study of Women's Health Across the Nation (SWAN) to investigate the impact of the mean and variability of follicle-stimulating hormone (FSH) and anti-MÜllerian hormone (AMH) on age at the final menstrual period (FMP). |
Bayesian hierarchical model;Multiple biomarkers;Variability;Limit of detection;Time-to-event outcomes;Threshold regression | Zhenke Wu;Michael Elliott | Link |
| Donglin Zeng | Faculty | Thu, Aug 7 | 8:30 AM | 9:50 AM | CC-102B | Innovations in Nonparametric and Functional Data Methods: Tackling Complex Data Challenges | Semiparametric Regression Analysis of Interval-Censored Multi-State Data with An Absorbing State | View AbstractIn studies of chronic diseases, the health status of a subject can often be characterized by a finite number of transient disease states and an absorbing state, such as death. The times of transitions among the transient states are ascertained through periodic examinations and thus interval-censored. The time of reaching the absorbing state is known or right-censored, with the transient state at the previous instant being unobserved. We provide a general framework for analyzing such multi-state data. We formulate the effects of potentially time-dependent covariates on the multi-state disease process through semiparametric proportional intensity models with random effects. We combine nonparametric maximum likelihood estimation with sieve estimation and develop a stable expectation-maximization algorithm. We establish the asymptotic properties of the proposed estimators and assess the performance of the proposed methods through extensive simulation studies. Finally, we provide an illustration with a cardiac allograft vasculopathy study. |
Multi-state model;Interval censoring;Nonparametric maximum likelihood estimation;Semiparametric efficiency;EM algorithm | Link | |
| Mengbing Li | Student | Thu, Aug 7 | 10:30 AM | 10:50 AM | CC-202C | Bayesian Inference in Categorical, Count and Privacy-preserving Latent Variable Models | A Bayesian Multilayered Simplex Factor Regression Model for Complex Epidemiological Data | View AbstractDietary acculturation can greatly impact chronic disease progression in migrant populations. Yet modeling the interrelationship between a large set of multivariate acculturation exposures and multivariate dietary consumption outcomes is challenging. Mixed membership models are useful for identifying the underlying dietary patterns by positing individuals to a few outcome subgroups of varying degrees, but difficulties arise when these outcomes relate to latent exposure subgroups. We propose a new multilayered simplex factor regression model that simultaneously derives latent acculturation exposure and dietary outcome patterns while capturing the exposure-outcome relationship. The relationship is modeled via a stick-breaking multivariate logistic regression on the individual membership scores, allowing for easy introduction of auxiliary information and flexible choices of the number of subgroups. We discuss model identifiability conditions and demonstrate this method on a migrant population in the United States. |
Bayesian methods;Latent class models;Grade of membership models;Multivariate categorical data;Epidemiology | Zhenke Wu | Link |
| Ying Yuan | Alumni | Thu, Aug 7 | 10:30 AM | 11:00 AM | CC-202B | Innovative Bayesian Methods for Leveraging Historical Data in Clinical Trials | PresentationPS-SAM: doubly robust propensity-score-integrated self-adapting mixture prior to dynamically borrow information from historical data | View AbstractThere has been a growing interest in incorporating historical data to enhance the efficiency or reduce the sample size of randomized controlled trials (RCTs). A key challenge is that patient characteristics of historical data may differ from those of the current RCT. To address this issue, one well-known approach is to employ propensity score matching or inverse probability weighting to adjust for baseline heterogeneity, enabling the incorporation of historical data into the inference of the RCT. However, this approach is subject to bias when there are unmeasured confounders. We address this issue by incorporating a self-adapting mixture (SAM) prior with propensity score matching and inverse probability weighting to enable additional adaptation for information borrowing in the presence of unmeasured confounders. The resulting propensity score-integrated SAM (PS-SAM) priors are doubly robust in the sense that if there are no unmeasured confounders, they result in an unbiased causal estimate of the treatment effect; and if there are unmeasured confounders, they provide a notably less biased treatment effect with better-controlled type I error. Simulation studies demonstrate that the PS-SAM prior exhibits desirable operating characteristics, with reasonably controlled type I error rates or substantial power gain, small bias, and low MSE, regardless of the presence of unmeasured confounders. |
Information borrowing;historical data;mixture prior | Link | |
| Veera Baladandayuthapani | Faculty | Thu, Aug 7 | 10:30 AM | 11:25 AM | CC-104E | Innovations in Biological Network Modeling: Unraveling Omics Data Analysis | Spatial Graphical Regression Models for Spatial Transcriptomics Data | View AbstractModern spatial transcriptomic profiling techniques facilitate spatially resolved, high-dimensional assessment of cellular gene transcription across the tumor domain. The characterization of spatially varying gene networks enables the discovery of heterogeneous regulatory patterns and biological mechanisms underlying cancer etiology. We propose a spatial Graphical Regression (sGR) model to infer spatially varying graphs for high-resolution multivariate spatial data. Unlike existing graphical models, sGR explicitly incorporates spatial information to infer non-linear conditional dependencies through Gaussian processes. It conducts sparse estimation and selection of spatially varying edges, at both spatial and sub-spatial levels. Extensive simulation studies illustrate the profitability of sGR for spatial graph structural recovery and estimation accuracy. Our methods are motivated by and applied to two spatial transcriptomics data sets in breast and prostate cancer, to investigate spatially varying gene connectivity patterns across the tumor microenvironment. |
spatial graphical regression;biological network;spatial transcriptomics;graphical model | Link |