Bridging workflow and data provenance using strong links

David Koop, Emanuele Santos, Bela Bauer, Matthias Troyer, Juliana Freire, Cláudio T. Silva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As scientists continue to migrate their work to computational methods, it is important to track not only the steps involved in the computation but also the data consumed and produced. While this provenance information can be captured, in existing approaches, it often contains only weak references between data and provenance. When data files or provenance are moved or modified, it can be difficult to find the data associated with the provenance or to find the provenance associated with the data. We propose a persistent storage mechanism that manages input, intermediate, and output data files, strengthening the links between provenance and data. This mechanism provides better support for reproducibility because it ensures the data referenced in provenance information can be readily located. Another important benefit of such management is that it allows caching of intermediate data which can then be shared with other users. We present an implemented infrastructure for managing data in a provenance-aware manner and demonstrate its application in scientific projects.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings
Pages397-415
Number of pages19
DOIs
StatePublished - 2010
Event22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010 - Heidelberg, Germany
Duration: Jun 30 2010Jul 2 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6187 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010
Country/TerritoryGermany
CityHeidelberg
Period6/30/107/2/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Bridging workflow and data provenance using strong links'. Together they form a unique fingerprint.

Cite this