Your notebook is not crumby enough, REPLace it

Michael Brachmann, William Spoth, Oliver Kennedy, Boris Glavic, Heiko Mueller, Sonia Castelo, Carlos Bautista, Juliana Freire

Research output: Contribution to conferencePaperpeer-review

Abstract

Notebook and spreadsheet systems are currently the defacto standard for data collection, preparation, and analysis. However, these systems have been criticized for their lack of reproducibility, versioning, and support for sharing. These shortcomings are particularly detrimental for data curation where data scientists iteratively build workflows to clean up and integrate data as a prerequisite for analysis. We present Vizier, an open-source tool that helps analysts to build and refine data pipelines. Vizier combines the flexibility of notebooks with the easy-to-use data manipulation interface of spreadsheets. Combined with advanced provenance tracking for both data and computational steps this enables reproducibility, versioning, and streamlined data exploration. Unique to Vizier is that it exposes potential issues with data, no matter whether they already exist in the input or are introduced by the operations of a notebook. We refer to such potential errors as data caveats. Caveats are propagated alongside data using principled techniques from uncertain data management. Vizier provides extensive user interface support for caveats, e.g., exposing them as summaries in a dedicated error view and highlighting cells with caveats in spreadsheets.

Original languageEnglish (US)
StatePublished - 2020
Event10th Annual Conference on Innovative Data Systems Research, CIDR 2020 - Amsterdam, Netherlands
Duration: Jan 12 2020Jan 15 2020

Conference

Conference10th Annual Conference on Innovative Data Systems Research, CIDR 2020
Country/TerritoryNetherlands
CityAmsterdam
Period1/12/201/15/20

Keywords

  • Data Science
  • Notebooks
  • Provenance
  • Workflow System

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Information Systems and Management
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Your notebook is not crumby enough, REPLace it'. Together they form a unique fingerprint.

Cite this