A distributed infrastructure for earth-science big data retrieval

Panagiotis Liakos, Panagiota Koltsida, George Kakaletris, Peter Baumann, Yannis Ioannidis, Alex Delis

Research output: Contribution to journalArticlepeer-review

Abstract

Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existing in multiple heterogeneous resources readily available. The widespread use of the XML data-exchange format has enabled the rapid accumulation of semi-structured metadata for Earth-Science data. In this paper, we exploit this popular use of XML and present the means for querying metadata emanating from multiple sources in a succinct and effective way. Thereby, we release the user from the very tedious and time consuming task of examining individual XML descriptions one by one. Our approach, termed Meta-Array Data Search (MAD Search), brings together diverse data sources while enhancing the user-friendliness of the underlying information sources. We gather metadata using different standards and construct an amalgamated service with the help of tools that discover and harvest such metadata; this service facilitates the end-user by offering easy and timely access to all metadata. The main contribution of our work is a novel query language termed xWCPS, that builds on top of two widely-adopted standards: XQuery and the Web Coverage Processing Service (WCPS). xWCPS furnishes a rich set of features regarding the way scientific data can be queried with. Our proposed unified language allows for requesting metadata while also giving processing directives. Consequently, the xWCPS-enabled MAD Search helps in both retrieval and processing of large data sets hosted in an heterogeneous infrastructure. We demonstrate the effectiveness of our approach through diverse use-cases that provide insights into the syntactic power and overall expressiveness of xWCPS. We evaluate MAD Search in a distributed environment that comprises five high-volume array-databases whose sizes range between 20 and 100 GB and so, we ascertain the applicability and potential of our proposal.

Original languageEnglish (US)
Article number1550002
JournalInternational Journal of Cooperative Information Systems
Volume24
Issue number2
DOIs
StatePublished - Jun 4 2015

Keywords

  • Array databases
  • declarative query language
  • scientific data

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications

Fingerprint Dive into the research topics of 'A distributed infrastructure for earth-science big data retrieval'. Together they form a unique fingerprint.

Cite this