TY - GEN
T1 - Indexing relations on the web
AU - Sardi Mergen, Sergio Luis
AU - Freire, Juliana
AU - Heuser, Carlos Alberto
PY - 2010
Y1 - 2010
N2 - There has been a substantial increase in the volume of (semi) structured data on the Web. This opens new opportunities for exploring and querying these data that goes beyond the keyword-based queries traditionally used on the Web. But supporting queries over a very large number of apparently disconnected Web sources is challenging. In this paper we propose index methods that capture both the structure of the sources and connections between them. The indexes are designed for data that is represented as relations, such as HTML tables, and support queries with predicates. We show how associations between overlapping sources are discovered, captured in the indexes, and used to derive query rewritings that join multiple sources. We demonstrate, through an experimental evaluation, that our approach scales to a large number of sources.
AB - There has been a substantial increase in the volume of (semi) structured data on the Web. This opens new opportunities for exploring and querying these data that goes beyond the keyword-based queries traditionally used on the Web. But supporting queries over a very large number of apparently disconnected Web sources is challenging. In this paper we propose index methods that capture both the structure of the sources and connections between them. The indexes are designed for data that is represented as relations, such as HTML tables, and support queries with predicates. We show how associations between overlapping sources are discovered, captured in the indexes, and used to derive query rewritings that join multiple sources. We demonstrate, through an experimental evaluation, that our approach scales to a large number of sources.
KW - Dataspaces
KW - Indexing
KW - Search engines
UR - http://www.scopus.com/inward/record.url?scp=77952261069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952261069&partnerID=8YFLogxK
U2 - 10.1145/1739041.1739094
DO - 10.1145/1739041.1739094
M3 - Conference contribution
AN - SCOPUS:77952261069
SN - 9781605589459
T3 - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
SP - 430
EP - 440
BT - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
T2 - 13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
Y2 - 22 March 2010 through 26 March 2010
ER -