Research Activities
2023 Big Data Summer Institute in Biostatistics Projects
For project work, participants are divided into small research teams and assigned to one faculty member leading a particular project area of their interest. A graduate student research assistant is assigned to each project group to facilitate the project work.
Project Group One: Imaging
Led by: Dr. Jian Kang and Dr. Krithika Suresh
Medical imaging refers to a variety of techniques for visual representations of some organs or tissue in a body for clinical analysis and medical intervention. Recent advances in technologies can generate a large amount of high resolution images in biomedical and clinical studies. It presents great opportunities and challenges for precision medicine and many other areas. One important research topic is on imaging-guided clinical diagnosis of disease, where the statistical models and machine learning algorithms play an important role. The BDSI imaging research group will focus on the imaging-based disease classification and feature selection problem. The project will consist of using imaging data to predict the disease status or the cognitive state of subjects. A training set will be used to build a classifier and identify important imaging biomarkers; and a testing set of data will be used to validate the prediction and feature selection performance. With the help of the instructors and graduate student assistant, the students will learn basic knowledge and computing tools for biomedical imaging data analysis; and will decide how they wish to model the data and perform the analysis. Either traditional statistical models and/or machine learning algorithms may be used.
Project Group Two: Data Mining/Statistics
Led by: Dr. Snigdha Panigrahi and Dr. Johann Gagnon-Bartsch
Statistics for Trustworthy Machine Learning: An accepted pathway to affirm a discovery is by replicating the experiment on data. Despite use of machine learning (ML) across scientific fields, an alarmingly high proportion of ML findings do not hold up in new datasets: this has led to the pressing replication crisis. To gauge chances of replication, researchers need tools to understand uncertainties in ML findings.
Multiple projects under the "Statistics for Trustworthy ML" theme will apply ML to model clinical data from imaging and genomics. These projects will investigate the potential of related findings from a replication perspective, and apply some recent tools in selective inference for uncertainty estimation. Students will gain experience in modeling large and complex datasets, and will be introduced to selective inference for improving replication.
Project Group Three: Genomics
Led by: Dr. Matt Zawistowski
The Genomics group will have multiple available projects connecting a health-related question to a large-scale genomic dataset, for example whole-genome Single Nucleotide Polymorphism data, single-cell RNA sequencing data or epigenetic methylation data. Students will form teams for a deep dive analysis on their specific project of interest with opportunity for open-ended exploration. Students will gain hands-on computing experience and valuable data manipulation skills working with the large genomic data files. We will apply many classical statistical techniques, learn about integration of complementary genomic data sources and explore machine learning and specialized genomic analysis methods.
Project Group Four: Machine Learning for Healthcare
Led by: Dr. Rahul Ladhania
In many ways, the field of healthcare serves as an appropriate playground for leveraging the power of machine learning methods, providing opportunities for uncovering insights to improve timely diagnosis, treatment, and prevention of diseases. The vast and multi-dimensional nature of healthcare data - from clinical sources such as electronic health records, medical images, genetic information to non-clinical sources such as wearables, mobile devices, social media behavior - can be leveraged to identify patterns and help healthcare providers make more informed decisions. At the same time, these opportunities are accompanied by challenges unique to healthcare, including but not limited to generalizability, interpretability, inferring causality, among others.
The BDSI Healthcare group will be introduced to and expected to grapple with these
challenges as they work with detailed clinical inpatient and outpatient data from
more than 4 million unique patients from across the Michigan Medicine enterprise.
Students will be introduced to domain-specific complexities of working with a multi-dimensional
healthcare database, gain hands-on experience with interpreting and solving problems
on prediction, under what conditions can their insights be generalizable to other
settings, and when can they be leveraged for causal inference.