TY - JOUR
T1 - Integration of large-scale data processing systems and traditional parallel database technology
AU - Abouzied, Azza
AU - Abadi, Daniel J.
AU - Bajda-Pawlikowski, Kamil
AU - Silberschatz, Avi
N1 - Publisher Copyright:
© 2019 VLDB Endowment.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2019/8
Y1 - 2019/8
N2 - In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.
AB - In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.
UR - http://www.scopus.com/inward/record.url?scp=85074503769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074503769&partnerID=8YFLogxK
U2 - 10.14778/3352063.3352145
DO - 10.14778/3352063.3352145
M3 - Conference article
AN - SCOPUS:85074503769
VL - 12
SP - 2290
EP - 2299
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
SN - 2150-8097
IS - 12
T2 - 45th International Conference on Very Large Data Bases, VLDB 2019
Y2 - 26 August 2017 through 30 August 2017
ER -