TY - GEN
T1 - Invisible loading
T2 - 16th International Conference on Extending Database Technology, EDBT 2013
AU - Abouzied, Azza
AU - Abadi, Daniel J.
AU - Silberschatz, Avi
PY - 2013
Y1 - 2013
N2 - Commercial analytical database systems suffer from a high "time-to-first-analysis": before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.
AB - Commercial analytical database systems suffer from a high "time-to-first-analysis": before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.
UR - http://www.scopus.com/inward/record.url?scp=84876789119&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876789119&partnerID=8YFLogxK
U2 - 10.1145/2452376.2452377
DO - 10.1145/2452376.2452377
M3 - Conference contribution
AN - SCOPUS:84876789119
SN - 9781450315975
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 10
BT - Advances in Database Technology - EDBT 2013
Y2 - 18 March 2013 through 22 March 2013
ER -