Integration of large-scale data processing systems and traditional parallel database technology

Azza Abouzied, Daniel J. Abadi, Kamil Bajda-Pawlikowski, Avi Silberschatz

Research output: Contribution to journalConference articlepeer-review


In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.

Original languageEnglish (US)
Pages (from-to)2290-2299
Number of pages10
JournalProceedings of the VLDB Endowment
Issue number12
StatePublished - Aug 2019
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: Aug 26 2017Aug 30 2017

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science


Dive into the research topics of 'Integration of large-scale data processing systems and traditional parallel database technology'. Together they form a unique fingerprint.

Cite this