TY - GEN
T1 - Mapping maintenance for data integration systems
AU - McCann, Robert
AU - Alshebli, Bedoor
AU - Le, Quoc
AU - Nguyen, Hoa
AU - Vu, Long
AU - Doan, Anhai
PY - 2005
Y1 - 2005
N2 - To answer user queries, a data integration system employs a set of semantic mappings between the mediated schema and the schemas of data sources. In dynamic environments sources often undergo changes that invalidate the mappings. Hence, once the system is deployed, the administrator must monitor it over time, to detect and repair broken mappings. Today such continuous monitoring is extremely labor intensive, and poses a key bottleneck to the widespread deployment of data integration systems in practice. We describe MAVERIC, an automatic solution to detecting broken mappings. At the heart of MAVERIC is a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources (e.g., value distributions, HTML layout properties). We describe how MAVERIC trains and deploys the sensors to detect broken mappings. Next we develop three novel improvements: perturbation (i.e., injecting arti cial changes into the sources) and multi-source training to improve detection accuracy, and Itering to further reduce the number of false alarms. Experiments over 114 real-world sources in six domains demonstrate the e ectiveness of our sensor-based approach over existing solutions, as well as the utility of our improvements.
AB - To answer user queries, a data integration system employs a set of semantic mappings between the mediated schema and the schemas of data sources. In dynamic environments sources often undergo changes that invalidate the mappings. Hence, once the system is deployed, the administrator must monitor it over time, to detect and repair broken mappings. Today such continuous monitoring is extremely labor intensive, and poses a key bottleneck to the widespread deployment of data integration systems in practice. We describe MAVERIC, an automatic solution to detecting broken mappings. At the heart of MAVERIC is a set of computationally inexpensive modules called sensors, which capture salient characteristics of data sources (e.g., value distributions, HTML layout properties). We describe how MAVERIC trains and deploys the sensors to detect broken mappings. Next we develop three novel improvements: perturbation (i.e., injecting arti cial changes into the sources) and multi-source training to improve detection accuracy, and Itering to further reduce the number of false alarms. Experiments over 114 real-world sources in six domains demonstrate the e ectiveness of our sensor-based approach over existing solutions, as well as the utility of our improvements.
UR - http://www.scopus.com/inward/record.url?scp=33745626486&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745626486&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33745626486
SN - 1595931546
SN - 9781595931542
T3 - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
SP - 1018
EP - 1029
BT - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
T2 - VLDB 2005 - 31st International Conference on Very Large Data Bases
Y2 - 30 August 2005 through 2 September 2005
ER -