Mapping maintenance for data integration systems

Robert McCann, Bedoor Alshebli, Quoc Le, Hoa Nguyen, Long Vu, Anhai Doan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To answer user queries, a data integration system employs a set of semantic mappings between the mediated schema and the schemas of data sources. In dynamic environments sources often undergo changes that invalidate the mappings. Hence, once the system is deployed, the administrator must monitor it over time, to detect and repair broken mappings. Today such continuous monitoring is extremely labor intensive, and poses a key bottleneck to the widespread deployment of data integration systems in practice. We describe MAVERIC, an automatic solution to detecting broken mappings. At the heart of MAVERIC is a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources (e.g., value distributions, HTML layout properties). We describe how MAVERIC trains and deploys the sensors to detect broken mappings. Next we develop three novel improvements: perturbation (i.e., injecting arti cial changes into the sources) and multi-source training to improve detection accuracy, and Itering to further reduce the number of false alarms. Experiments over 114 real-world sources in six domains demonstrate the e ectiveness of our sensor-based approach over existing solutions, as well as the utility of our improvements.

Original languageEnglish (US)
Title of host publicationVLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
Pages1018-1029
Number of pages12
StatePublished - 2005
EventVLDB 2005 - 31st International Conference on Very Large Data Bases - Trondheim, Norway
Duration: Aug 30 2005Sep 2 2005

Publication series

NameVLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
Volume3

Other

OtherVLDB 2005 - 31st International Conference on Very Large Data Bases
Country/TerritoryNorway
CityTrondheim
Period8/30/059/2/05

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Mapping maintenance for data integration systems'. Together they form a unique fingerprint.

Cite this