2024 Big Data Summer Institute in Biostatistics Projects
For project work, participants are divided into small research teams and assigned to one faculty member leading a particular project area of their interest. A graduate student research assistant is assigned to each project group to facilitate the project work.
Project Group One: Cancer Data Science
Co-Led by: Dr. Veera Baladandayuthapani, Dr. Jian Kang, and Dr. Junsouk Choi
The Cancer Data Science group will delve into statistical, computational, and mathematical questions that arise in cancer research. The research project will involve an application to advancing cancer prevention and care. Examples include developing predictive models to assess recurrence risk based on clinical, genomic, or pathological data; investigating spatial and temporal dynamics of tumor evolution using genomic and medical imaging data; leveraging clinical data to identify biomarkers associated with tumor initiation and progression, and much more. Student teams will initiate novel research questions using the provided data sources and conduct in-depth analysis to explore these questions. This immersive experience will teach students valuable skills in data manipulation, statistical computing, and data visualization. Within this research group, students will have a chance to engage with members of the UM Cancer Data Science group and learn to apply advanced statistical methods, such as survival analysis, machine learning, and spatial data analysis.
Project Group Two: Data Mining and Machine Learning in Healthcare Data
Co-Led by: Dr. Rahul Ladhania and Dr. Snigdha Panigrahi
Statistics for Trustworthy Machine Learning: An accepted pathway to affirm a discovery is by replicating the experiment on data. Despite use of machine learning (ML) across scientific fields, an alarmingly high proportion of ML findings do not hold up in new datasets: this has led to the pressing replication crisis. To gauge chances of replication, researchers need tools to understand uncertainties in ML findings.
Multiple projects under the "Statistics for Trustworthy ML" theme will apply ML to model clinical data from imaging and genomics. These projects will investigate the potential of related findings from a replication perspective, and apply some recent tools in selective inference for uncertainty estimation. Students will gain experience in modeling large and complex datasets, and will be introduced to selective inference for improving replication.
Project Group Three: Genomics
Co-Led by: Dr. Matt Zawistowski and Dr. Xiang Zhou
The Genomics group will have multiple available projects connecting a health-related question to a large-scale genomic dataset, for example whole-genome Single Nucleotide Polymorphism data, single-cell RNA sequencing data or epigenetic methylation data. Students will form teams for a deep dive analysis on their specific project of interest with opportunity for open-ended exploration. Students will gain hands-on computing experience and valuable data manipulation skills working with the large genomic data files. We will apply many classical statistical techniques, learn about integration of complementary genomic data sources and explore machine learning and specialized genomic analysis methods.