Datasets of enormous complexity and size are being generated in the diverse areas of genomics, imaging, electronic health records, social media and environmental monitoring. The insights obtained from these massive data sources will inform the prevention and treatment of human diseases and play a major role in biology, medicine and public health in the coming decade. But more training is needed to prepare the next generation of leaders to tackle these challenges.
The Big Data Summer Institute, a six-week interdisciplinary training and research program at the University of Michigan, has been designed to introduce undergraduate students to the growing number of approaches to big data:
- First, you will gain a comprehensive overview of the field of big data by attending a variety of lectures in the mornings and working on research projects in the afternoons.
- On Fridays, you will take part in professional preparation activities and attend journey lectures at lunch. Journey lectures showcase academic journeys of researchers at different stages of their career in data science.
- At the conclusion of the institute, you will present your work and learn about other student projects at a research symposium and attend a professional development workshop.
- Throughout the program, you will have the unique opportunity to interact with distinguished faculty and graduate students from the U-M departments of biostatistics, information science, statistics, and electrical engineering and computer science.
- There is no cost to attend. All accepted participants receive a stipend to cover their travel, housing, and meals.
There are also social events like canoeing in the Huron River, trips to the Michigan Stadium and Museum of Art, a group BBQ and a welcome dinner.
On Monday, Tuesday and Thursday mornings you will attend lectures provided by faculty members from biostatistics, information science, statistics, and electrical engineering and computer science from 9 a.m. to noon.
Below is a tentative outline of the lecture topics and speakers:
- Introduction to R and Python
- Data Acquisition, Database Management
- Common computing platform, Linux environment
- Data Structures § Data Visualization
- Probability and Statistical Inference
- Cloud, Parallel and Distributed Computing
- Sampling Methods: Markov Chain, Monte Carlo
- Medical Informatics/Computing
- Matrix Computation § Bias and Confounding, Missing Data, Causal Inference
- Machine Learning, Graphical Models, Sparse Learning with Matrices, Social Network Analysis, Imaging
- Gonçalo Abecasis (BIOS)
- Veronica Berrocal (BIOS)
- Jedidiah Carlson (BIOS)
- Arya Farahi (PHYSICS)
- Matthew Flickinger (BIOS)
- Johan Gagnon-Bartsch (STAT)
- Brett Griffiths
- Hui Jiang (BIOS)
- Timothy Johnson (BIOS)
- Hyun Min Kang (BIOS)
- Jian Kang (BIOS)
- Matthew Kay (CSE)
- Kelley Kidwell (BIOS)
- Seunggeun (Shawn) Lee (BIOS)
- Rod Little (BIOS)
- Harsha Madhyastha (EECS)
- Bhramar Mukherjee (BIOS)
- Kayvan Najarian (CCMB)
- Karandeep Singh (MED SCHOOL)
- Jonathan Stroud (CSE)
- Ambuj Tewari (STAT)
- Lu Wang (BIOS)
- Jenna Wiens (EECS)
- Matt Zawistowski (BIOS)
- Sebastian Zoellner (BIOS)
Every afternoon from Monday to Friday, 2-5 p.m., you will work on big data projects. For project work, students are divided into small research teams and assigned to one faculty member leading a particular project area of their interest.
There are four project areas, you can learn more on the Research Activities page:
- Genomics (Project leader: TBD)
- Electronic Health Records (Project leader: TBD)
- Imaging (Project leader: TBD)
- Data Mining and Machine Learning (Project Leader: TBD)
On the last two days of the institute, there will be a concluding Student Research Symposium where you will have the opportunity to showcase your team’s work through short talks and a poster session, as well as attend talks by nationally recognized researchers in big data.
The program receives input and guidance from an internal advisory committee consisting of five University of Michigan faculty members: