Abstract
In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.
Original language | English (US) |
---|---|
Pages (from-to) | 2290-2299 |
Number of pages | 10 |
Journal | Proceedings of the VLDB Endowment |
Volume | 12 |
Issue number | 12 |
DOIs | |
State | Published - Aug 2019 |
Event | 45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States Duration: Aug 26 2017 → Aug 30 2017 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science