Data Integration
In the era of data science, data that provide useful information for answering a scientific question are oftentimes available from multiple sources. An analysis based on a single data source may yield biases in estimation or results that are not accurate enough. Integrating data from multiple sources becomes essential in order to pull together different pieces of information to provide a unified view, draw more accurate conclusions, and make more insightful decisions. Challenges in this process arise because of data heterogeneity across different sources. Examples include data stored at different repositories with changing sets of variables, measurements and volumes, and data from different published research papers that are summarized in varying forms. Faculty in Biostatistics are engaged in the development of new methodologies to address data integration problems.
Faculty: M. Elliott, Peisong Han, G. Li, J.Taylor, X. Zhou