TY - JOUR
T1 - HadoopDB
T2 - An architectural hybrid of mapreduce and DBMS technologies for analytical workloads
AU - Abouzeid, Azza
AU - Bajda-Pawlikowski, Kamil
AU - Abadi, Daniel
AU - Silberschatz, Avi
AU - Rasin, Alexander
PY - 2009
Y1 - 2009
N2 - The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
AB - The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
UR - http://www.scopus.com/inward/record.url?scp=79957809015&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957809015&partnerID=8YFLogxK
U2 - 10.14778/1687627.1687731
DO - 10.14778/1687627.1687731
M3 - Article
AN - SCOPUS:79957809015
SN - 2150-8097
VL - 2
SP - 922
EP - 933
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 1
ER -