Entity Resolution with Societal Impacts in Statistical Machine Learning
University of Michigan School of Public Health
1690 SPH I, 1415 Washington Heights Ann Arbor, MI 48109-2029

Very often information about social entities is scattered across multiple databases.Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions.Among the types of questions that have been, and can be, addressed by combining information include: How accurate are census enumerations for minority groups? How many of the elderly are at high risk for sepsis in different parts of the country? How many people were victims of war crimes in recent conflicts in Syria? In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns.In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis.Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. In this talk, we describe the past and present challenges with entity resolution, with applications to the Syrian conflict but also official statistics, and the food and music industry. This work, which is a joint collaboration with researchers at Rice University and the Human Rights Data Analysis Group (HRDAG) touches on the interdisciplinary research that is crucial to problems with societal impacts that are at the forefront of both national and international news. Bio:https://resteorts.github.io/bio.html

Department of Biostatistics

Entity Resolution with Societal Impacts in Statistical Machine Learning

Rebecca Steorts, Ph.D. - Assistant Professor, Department of Statistical Science

icon to add this event to your google calendarMarch 8, 2018
3:30 PM - 5:00 PM
1690 SPH I
1415 Washington Heights
Ann Arbor, MI 48109-2029
Sponsored by: Department of Biostatistics
Contact Information: Zhenke Wu, Ph.D. (zhenkewu@umich.edu)

Very often information about social entities is scattered across multiple databases.Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions.Among the types of questions that have been, and can be, addressed by combining information include: How accurate are census enumerations for minority groups? How many of the elderly are at high risk for sepsis in different parts of the country? How many people were victims of war crimes in recent conflicts in Syria? In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns.In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis.Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. In this talk, we describe the past and present challenges with entity resolution, with applications to the Syrian conflict but also official statistics, and the food and music industry. This work, which is a joint collaboration with researchers at Rice University and the Human Rights Data Analysis Group (HRDAG) touches on the interdisciplinary research that is crucial to problems with societal impacts that are at the forefront of both national and international news. Bio:https://resteorts.github.io/bio.html