TY - JOUR
T1 - A distributed infrastructure for earth-science big data retrieval
AU - Liakos, Panagiotis
AU - Koltsida, Panagiota
AU - Kakaletris, George
AU - Baumann, Peter
AU - Ioannidis, Yannis
AU - Delis, Alex
N1 - Funding Information:
We would also like to express our gratitude to all the anonymous reviewers who helped us improve the presentation of our work with their insightful comments. This research has been partially supported by EC FP7 Workprogramme (2007– 2013) under Grant agreement No. 283610 “European Scalable Earth-Science Service Environment (EarthServer )” and EC H2020 under Grant agreement No. 654367 “Agile Analytics on Big Data Cubes (EarthServer 2)”.
Publisher Copyright:
© 2015 World Scientific Publishing Company.
PY - 2015/6/4
Y1 - 2015/6/4
N2 - Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existing in multiple heterogeneous resources readily available. The widespread use of the XML data-exchange format has enabled the rapid accumulation of semi-structured metadata for Earth-Science data. In this paper, we exploit this popular use of XML and present the means for querying metadata emanating from multiple sources in a succinct and effective way. Thereby, we release the user from the very tedious and time consuming task of examining individual XML descriptions one by one. Our approach, termed Meta-Array Data Search (MAD Search), brings together diverse data sources while enhancing the user-friendliness of the underlying information sources. We gather metadata using different standards and construct an amalgamated service with the help of tools that discover and harvest such metadata; this service facilitates the end-user by offering easy and timely access to all metadata. The main contribution of our work is a novel query language termed xWCPS, that builds on top of two widely-adopted standards: XQuery and the Web Coverage Processing Service (WCPS). xWCPS furnishes a rich set of features regarding the way scientific data can be queried with. Our proposed unified language allows for requesting metadata while also giving processing directives. Consequently, the xWCPS-enabled MAD Search helps in both retrieval and processing of large data sets hosted in an heterogeneous infrastructure. We demonstrate the effectiveness of our approach through diverse use-cases that provide insights into the syntactic power and overall expressiveness of xWCPS. We evaluate MAD Search in a distributed environment that comprises five high-volume array-databases whose sizes range between 20 and 100 GB and so, we ascertain the applicability and potential of our proposal.
AB - Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existing in multiple heterogeneous resources readily available. The widespread use of the XML data-exchange format has enabled the rapid accumulation of semi-structured metadata for Earth-Science data. In this paper, we exploit this popular use of XML and present the means for querying metadata emanating from multiple sources in a succinct and effective way. Thereby, we release the user from the very tedious and time consuming task of examining individual XML descriptions one by one. Our approach, termed Meta-Array Data Search (MAD Search), brings together diverse data sources while enhancing the user-friendliness of the underlying information sources. We gather metadata using different standards and construct an amalgamated service with the help of tools that discover and harvest such metadata; this service facilitates the end-user by offering easy and timely access to all metadata. The main contribution of our work is a novel query language termed xWCPS, that builds on top of two widely-adopted standards: XQuery and the Web Coverage Processing Service (WCPS). xWCPS furnishes a rich set of features regarding the way scientific data can be queried with. Our proposed unified language allows for requesting metadata while also giving processing directives. Consequently, the xWCPS-enabled MAD Search helps in both retrieval and processing of large data sets hosted in an heterogeneous infrastructure. We demonstrate the effectiveness of our approach through diverse use-cases that provide insights into the syntactic power and overall expressiveness of xWCPS. We evaluate MAD Search in a distributed environment that comprises five high-volume array-databases whose sizes range between 20 and 100 GB and so, we ascertain the applicability and potential of our proposal.
KW - Array databases
KW - declarative query language
KW - scientific data
UR - http://www.scopus.com/inward/record.url?scp=84938741910&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938741910&partnerID=8YFLogxK
U2 - 10.1142/S0218843015500021
DO - 10.1142/S0218843015500021
M3 - Article
AN - SCOPUS:84938741910
VL - 24
JO - International Journal of Cooperative Information Systems
JF - International Journal of Cooperative Information Systems
SN - 0218-8430
IS - 2
M1 - 1550002
ER -