2022 Big Data Summer Institute in Biostatistics Projects
For project work, participants are divided into small research teams and assigned to one faculty member leading a particular project area of their interest. A graduate student research assistant is assigned to each project group to facilitate the project work.
Project Group One: Imaging
Led by: Dr. Jian Kang
Medical imaging refers to a variety of techniques for visual representations of some organs or tissue in a body for clinical analysis and medical intervention. Recent advances in technologies can generate a large amount of high resolution images in biomedical and clinical studies. It presents great opportunities and challenges for precision medicine and many other areas. One important research topic is on imaging-guided clinical diagnosis of disease, where the statistical models and machine learning algorithms play an important role. The BDSI imaging research group will focus on the imaging-based disease classification and feature selection problem. The project will consist of using imaging data to predict the disease status or the cognitive state of subjects. A training set will be used to build a classifier and identify important imaging biomarkers; and a testing set of data will be used to validate the prediction and feature selection performance. With the help of the instructors and graduate student assistant, the students will learn basic knowledge and computing tools for biomedical imaging data analysis; and will decide how they wish to model the data and perform the analysis. Either traditional statistical models and/or machine learning algorithms may be used.
Project Group Two: Data Mining
Led by: Dr. Johann Gagnon Bartsch
In recent years, there has been substantial interest in whether and how analyses of social media data can be used to improve understanding of public opinion on a wide range of topics. It is hoped that social media may complement or even provide an alternative to existing methods for researching public opinion, such as designed sample surveys. The primary goal of this project is to use social media data to learn about public opinions regarding (a) very broadly, trust in government; (b) more specifically, trust in official statistics; and (c) most specifically, trust in official statistics regarding the COVID-19 pandemic and in particular vaccine safety. A secondary goal is to systematically explore which algorithms (e.g., topic modeling, sentiment analysis, etc.) and corresponding visualizations are most effective for producing various forms of qualitative and quantitative insights.
Project Group Three: Genomics
Led by: Dr. Matt Zawistowski
Electronic Health Records (EHRs) contain the clinical history for patients within a health system including diagnoses, lab results and medical procedures. Although originally intended for billing and record keeping, EHRs have become a popular data source for health-related research. The large sample sizes and broad scope of clinical variables provide limitless applications of EHR-based research. In this project, students will be given access to de-identified EHR data from Michigan Medicine. Students will work in small groups to develop a health-related question of interest and create an analytic dataset from the raw EHR data to address this question.
We will explore how clinical outcomes are defined based on combinations of information. For example, how longitudinal diagnoses and clinical lab results captured over time can be used to identify disease cases. Students will be given freedom to try various statistical techniques that interest them.
Project Group Four: Machine Learning
Led by: Dr. Nikola Banovic
Delivering Behavior-based Healthcare Interventions at Home to People with Multiple Sclerosis
People with Multiple sclerosis (MS), a progressive autoimmune disease of the central nervous system, experience physical impairment and chronic pain, fatigue, depressed mood, and cognitive problems. Such poor symptoms are often related to numerous negative outcomes, including employment, disability, social impairment, life satisfaction, interference with daily activities, general mental and physical health, and community integration. Interventions that target the most severe or impactful symptoms could guide timing of medications and selection of self-management strategies, and minimize treatments when they are not needed. Such interventions could improve patients’ healthcare outcomes to improve their quality of life and functional ability. However, research has shown that the most impactful symptoms in MS change daily within a person, which makes it challenging to deliver MS interventions to patients at their home. Students will explore a set of Behavior Modeling and Machine Learning techniques to automatically detect, extract, and capture unique patterns of daily changes of most impactful symptoms and behaviors of patients with MS and relate them with their healthcare outcomes. They will use the models they built to understand the relationship between patients' symptoms, different home-based interventions and self-management strategies, and their outcomes. Example code and implementation will be largely conducted in Python, but will rely on external packages/libraries. Students will be guided through the full "data intensive science" pipeline from data extraction to preprocessing, model selection, evaluation, interpretation and visualization of results. Students will experience firsthand opportunities to create models that will inform the design of future technologies that deliver new behavior-based interventions and prevention strategies to patients to improve their healthcare outcomes in a way that increases health access and equity.
Project Group Five: Infectious Diseases
More information will be available soon.