TY - JOUR
T1 - Blink and it's done
T2 - Interactive queries on very large data
AU - Agarwal, Sameer
AU - Panda, Aurojit
AU - Mozafari, Barzan
AU - Iyer, Anand P.
AU - Madden, Samuel
AU - Stoica, Ion
PY - 2012/8
Y1 - 2012/8
N2 - In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150× faster than Hive on MapReduce and 10-150× faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2 - 10%.
AB - In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150× faster than Hive on MapReduce and 10-150× faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2 - 10%.
UR - http://www.scopus.com/inward/record.url?scp=84873191849&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873191849&partnerID=8YFLogxK
U2 - 10.14778/2367502.2367533
DO - 10.14778/2367502.2367533
M3 - Article
AN - SCOPUS:84873191849
SN - 2150-8097
VL - 5
SP - 1902
EP - 1905
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 12
ER -