Integrative statistical approaches for the analysis of whole-genome sequencing data
Iuliana Ionita-Laza, Ph.D., Associate Professor - Biostatistics - Columbia University
October 25, 2018
3:30 pm - 5:00 pm
3755 SPH I
1415 Washington Heights
Ann Arbor, MI 48109-2029
Sponsored by: Depart,emt of Biostatistics
Contact Information: Zhenke Wu (zhenkewu@umich.edu); Peisong Han (peisong@umich.edu)
Continuous advances in massively parallel sequencing technologies make large whole-genome sequencing studies increasingly feasible, including the NHLBI Trans-Omics for Precision Medicine (TopMed) and UK Biobank studies. The analysis of such data is challenging due to the large number of rare variants in noncoding regions of the genome, our poor understanding of their functional effects, and the lack of natural units for testing (e.g. the analogue of genes in coding regions). In this talk I will describe some of our effort to address these challenges using statistical and computational approaches. In particular, I will first discuss an unsupervised approach based on a latent Dirichlet allocation model to predict functional effects of genetic variants in a cell type/tissue specific manner. I will also introduce further extensions to the semi-supervised setting where high-quality experimentally derived labels are available for a small to modest number of variants. I will then discuss GenoScan, a scan statistic approach to simultaneously detect the existence, and estimate the location of the association signal in a pre-specified large region or at genome-wide scale. GenoScan can incorporate multiple functional annotations of genetic variants for improved power to identify the signals in noncoding regions. I will show applications to several datasets, including the Metabochip dataset on 12,281 individuals, and whole-genome sequencing data in the Simons Simplex Collection (SSC). Light refreshments for seminar guests will be served at 3:10 p.m.