TY - GEN
T1 - Pre-processing and indexing techniques for constellation queries in big data
AU - Khatibi, Amir
AU - Porto, Fabio
AU - Rittmeyer, Joao Guilherme
AU - Ogasawara, Eduardo
AU - Valduriez, Patrick
AU - Shasha, Dennis
N1 - Publisher Copyright:
© 2017, Springer International Publishing AG.
PY - 2017
Y1 - 2017
N2 - Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.
AB - Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.
KW - Constellation queries
KW - Dataset pre-processing
KW - Geometric shapes
KW - PH-tree indexing
KW - Query pre-processing
KW - SQL extension
UR - http://www.scopus.com/inward/record.url?scp=85028471280&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028471280&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-64283-3_12
DO - 10.1007/978-3-319-64283-3_12
M3 - Conference contribution
AN - SCOPUS:85028471280
SN - 9783319642826
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 164
EP - 172
BT - Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings
A2 - Bellatreche, Ladjel
A2 - Chakravarthy, Sharma
PB - Springer Verlag
T2 - 19th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2017
Y2 - 28 August 2017 through 31 August 2017
ER -