2022 Big Data Summer Institute in Biostatistics Projects
For project work, participants are divided into small research teams and assigned to one faculty member leading a particular project area of their interest. A graduate student research assistant is assigned to each project group to facilitate the project work.
Project Group One: Imaging
Led by: Dr. Jian Kang
Medical imaging refers to a variety of techniques for visual representations of some organs or tissue in a body for clinical analysis and medical intervention. Recent advances in technologies can generate a large amount of high resolution images in biomedical and clinical studies. It presents great opportunities and challenges for precision medicine and many other areas. One important research topic is on imaging-guided clinical diagnosis of disease, where the statistical models and machine learning algorithms play an important role. The BDSI imaging research group will focus on the imaging-based disease classification and feature selection problem. The project will consist of using imaging data to predict the disease status or the cognitive state of subjects. A training set will be used to build a classifier and identify important imaging biomarkers; and a testing set of data will be used to validate the prediction and feature selection performance. With the help of the instructors and graduate student assistant, the students will learn basic knowledge and computing tools for biomedical imaging data analysis; and will decide how they wish to model the data and perform the analysis. Either traditional statistical models and/or machine learning algorithms may be used.
Project Group Two: Data Mining
Led by: Dr. Johann Gagnon Bartsch
Harnessing Public Data to Explore and Validate Popular Health Claims
The primary mechanism for communicating health information to the public is through media - whether that be by major news sources, podcasts, or Twitter, etc. These sources might rely on academic research, other sources of journalism, or data directly to support their claims. In this project, the students will combine public data sources to investigate patterns in the data that could confirm, deny, or complicate the conclusions made in a major online news article. Given the explosion of public data availability, there is often opportunity for anyone to evaluate whether claims made in the media actually hold. Students will dig deeper into the conclusions made about patterns in public health in an article as well as the support for those conclusions. This may include reproducing original results, investigating alternative explanations for the results than the article’s conclusion, and considering whether the original results take power structures (like systemic racism) into account, all through the use of large, public data repositories. Students will learn valuable tools for data management and mining by starting from messy, public data and determining the best path to validate the article’s claims.
Project Group Three: Genomics
Led by: Dr. Matt Zawistowski
The Genomics group will have multiple available projects connecting a health-related question to a large-scale genomic dataset, for example whole-genome Single Nucleotide Polymorphism data, single-cell RNA sequencing data or epigenetic methylation data. Students will form teams for a deep dive analysis on their specific project of interest with opportunity for open-ended exploration. Students will gain hands-on computing experience and valuable data manipulation skills working with the large genomic data files. We will apply many classical statistical techniques, learn about integration of complementary genomic data sources and explore machine learning and specialized genomic analysis methods.
Project Group Four: Machine Learning
Led by: Dr. Nikola Banovic
Explainable AI (XAI) for Tumor Segmentation
Precise tumor segmentation forms an integral part of treatment planning of glioma, a malignant manifestation of brain tumors. Radiologists segment tumors based on their appearance in medical images and indicate their confidence about their segmentation. Although existing deep learning (DL) algorithms could greatly accelerate the segmentation process, there remain barriers to adoption of such algorithms in clinical practice. Such barriers include algorithms' inability to explain and justify their decision-making process and lack of transparency in communicating how certain they are while making segmentation predictions. Existing (often mathematical) approaches to AI explainability mostly assume the explanation is meant for math-savvy AI creators rather than decision makers with domain expertise (e.g., radiologists). In this project, students will use a state-of-the-art 7.8 million-dimensional DNN for brain tumor segmentation trained on a publicly available Magnetic Resonance Imaging (MRI) from The Cancer Genome Atlas (TCGA) Glioblastoma Multiforme (GBM) and Low Grade Glioma (LGG) collection. Students will design and implement interactions that can support the investigation of capabilities and limitations of the DL algorithm to improve medical decision-making. Students will explore different approaches to AI explainability that will contribute to technology-driven innovation and broader adoption of AI in healthcare.
Project Group Five: Infectious Diseases
Led by: Dr. Bhramar Mukherjee
Due to the public and media attention on models describing the transmission of SARS-CoV-2, there has been exploding interest in mechanistic mathematical models with compartmental specifications, e.g., the susceptible-exposed-infected-recovered (SEIR) model. This class of models are quite distinct from traditional statistical models that are mostly defined through regression structures or modern machine learning algorithms. These models have been used to evaluate policy decisions (like lockdown, mandatory face covering, travel ban) and in guiding reopening strategies. Students will get an opportunity to use exciting data and learn about mathematical/stochastic modeling of contagious infections.