Putting Lipstick on Pig: Enabling databasestyle workflow provenance

Yael Amsterdamer, Susan B. Davidson, Daniel Deutch, Tova Milo, Julia Stoyanovich, Val Tannen

    Research output: Contribution to journalArticle

    Abstract

    Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

    Original languageEnglish (US)
    Pages (from-to)346-357
    Number of pages12
    JournalProceedings of the VLDB Endowment
    Volume5
    Issue number4
    DOIs
    StatePublished - Dec 2011

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • Computer Science(all)

    Cite this

    Amsterdamer, Y., Davidson, S. B., Deutch, D., Milo, T., Stoyanovich, J., & Tannen, V. (2011). Putting Lipstick on Pig: Enabling databasestyle workflow provenance. Proceedings of the VLDB Endowment, 5(4), 346-357. https://doi.org/10.14778/2095686.2095693