Courses Details

EPID708 Machine Learning for Epidemiologic Analysis in the Era of Big Data

  • Graduate level
  • Summer term(s)
  • 1 Credit Hour(s)
  • Instructor(s): Staff
  • Last offered Summer 2016
  • Advisory Prerequisites: Introductory course in statistics as well as courses or working knowledge of basic regressions (linear, logistic, etc.). Having some background in the programming language R preferred.
  • Description: Course focuses on advances in machine learning and its application to causal inference and prediction via Targeted Learning, which allows the use of machine learning algorithms for prediction and estimating so-called causal parameters, such as average treatment effects, optimal treatment regimes, etc. We will discuss implementation via cloud computing.
  • Course Goals: • A basic understanding of causal inference, including structural causal models, definition of causal parameters via counterfactual distributions, and ways to establish identifiability from observed data. • Familiarity and ability to implement machine learning, specifically the concepts of SuperLearning and the power of cross-validation in data-adaptive estimation. • Ability to apply machine learning algorithms to prediction problems and estimate and derive inference for the resulting fit. • Ability to use the fits of machine learning algorithms to estimate causal effects using simple substitution estimators. • Ability to apply Targeted Learning approaches (e.g., targeted maximum likelihood estimation) to estimate, using machine learning, a priori specified treatment effects as well as general variable importance measures. • A basic understanding of how to use parallel computing and large computer clusters to be able to estimate using computer intensive algorithms on large (Big Data) data sets. • How the general methodology applies to goals of Precision Medicine.
  • Competencies: • Ability to apply estimation roadmap to novel data questions. • Ability to implement estimation via R and existing software packages. • Basic knowledge of how to use such algorithms on Big Data including the use of cloud computing.