Research Activities

Research Activities

2023 Big Data Summer Institute in Biostatistics Projects

For project work, participants are divided into small research teams and assigned to one faculty member leading a particular project area of their interest. A graduate student research assistant is assigned to each project group to facilitate the project work.  

Project Group One: Imaging

Led by: Dr. Jian Kang and Dr. Krithika Suresh

Medical imaging refers to a variety of techniques for visual representations of some organs or tissue in a body for clinical analysis and medical intervention. Recent advances in technologies can generate a large amount of high resolution images in biomedical and clinical studies. It presents great opportunities and challenges for precision medicine and many other areas. One important research topic is on imaging-guided clinical diagnosis of disease, where the statistical models and machine learning algorithms play an important role.  The BDSI imaging research group will focus on the imaging-based disease classification and feature selection problem. The project will consist of using imaging data to predict the disease status or the cognitive state of subjects. A training set will be used to build a classifier and identify important imaging biomarkers; and a testing set of data will be used to validate the prediction and feature selection performance.  With the help of the instructors and graduate student assistant, the students will learn basic knowledge and computing tools for biomedical imaging data analysis;   and will decide how they wish to model the data and perform the analysis. Either traditional statistical models and/or machine learning algorithms may be used. 


Project Group Two: Data Mining/Statistics

Led by: Dr. Snigdha Panigrahi and Dr. Johann Gagnon-Bartsch

Statistics for Trustworthy Machine Learning:  An accepted pathway to affirm a discovery is by replicating the experiment on data. Despite use of machine learning (ML) across scientific fields, an alarmingly high proportion of ML findings do not hold up in new datasets: this has led to the pressing replication crisis. To gauge chances of replication, researchers need tools to understand uncertainties in ML findings.  

Multiple projects under the "Statistics for Trustworthy ML" theme will apply ML to model clinical data from imaging and genomics. These projects will investigate the potential of related findings from a replication perspective, and apply some recent tools in selective inference for uncertainty estimation.  Students will gain experience in modeling large and complex datasets, and will be introduced to selective inference for improving replication.


Project Group Three: Genomics

Led by: Dr. Matt Zawistowski

The Genomics group will have multiple available projects connecting a health-related question to a large-scale genomic dataset, for example whole-genome Single Nucleotide Polymorphism data, single-cell RNA sequencing data or epigenetic methylation data. Students will form teams for a deep dive analysis on their specific project of interest with opportunity for open-ended exploration. Students will gain hands-on computing experience and valuable data manipulation skills working with the large genomic data files. We will apply many classical statistical techniques, learn about integration of complementary genomic data sources and explore machine learning and specialized genomic analysis methods. 


Project Group Four: Machine Learning for Healthcare

Led by: Dr. Rahul Ladhania

In many ways, the field of healthcare serves as an appropriate playground for leveraging the power of machine learning methods, providing opportunities for uncovering insights to improve timely diagnosis, treatment, and prevention of diseases. The vast and multi-dimensional nature of healthcare data - from clinical sources such as electronic health records, medical images, genetic information to non-clinical sources such as wearables, mobile devices, social media behavior - can be leveraged to identify patterns and help healthcare providers make more informed decisions. At the same time, these opportunities are accompanied by challenges unique to healthcare, including but not limited to generalizability, interpretability, inferring causality, among others.


The BDSI Healthcare group will be introduced to and expected to grapple with these challenges as they work with detailed clinical inpatient and outpatient data from more than 4 million unique patients from across the Michigan Medicine enterprise. Students will be introduced to domain-specific complexities of working with a multi-dimensional healthcare database, gain hands-on experience with interpreting and solving problems on prediction, under what conditions can their insights be generalizable to other settings, and when can they be leveraged for causal inference.