TY - GEN
T1 - Querying structured information sources on the Web
AU - Mergen, Sergio
AU - Freire, Juliana
AU - Heuser, Carlos Alberto
PY - 2008
Y1 - 2008
N2 - To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.
AB - To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.
UR - http://www.scopus.com/inward/record.url?scp=70349116794&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349116794&partnerID=8YFLogxK
U2 - 10.1145/1497308.1497394
DO - 10.1145/1497308.1497394
M3 - Conference contribution
AN - SCOPUS:70349116794
SN - 9781605583495
T3 - Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008
SP - 470
EP - 476
BT - Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008
T2 - 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008
Y2 - 24 November 2008 through 26 November 2008
ER -