TY - GEN
T1 - Optimal hashing in external memory
AU - Conway, Alex
AU - Farach-Colton, Martín
AU - Shilane, Philip
N1 - Publisher Copyright:
© Alex Conway, Martín Farach-Colton, and Philip Shilane;.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and P tra u established an update/query tradeo curve for external-hash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires (logλ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for λ that is (log log M + logM N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of λ. The simplicity of BOAs allows them to be readily modified to achieve the following results: A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.
AB - Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and P tra u established an update/query tradeo curve for external-hash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires (logλ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for λ that is (log log M + logM N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of λ. The simplicity of BOAs allows them to be readily modified to achieve the following results: A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.
KW - Asymmetric data structures
KW - Cache-oblivious algorithms
KW - External memory algorthims
KW - Hash tables
UR - http://www.scopus.com/inward/record.url?scp=85049785883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049785883&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ICALP.2018.39
DO - 10.4230/LIPIcs.ICALP.2018.39
M3 - Conference contribution
AN - SCOPUS:85049785883
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018
A2 - Kaklamanis, Christos
A2 - Marx, Daniel
A2 - Chatzigiannakis, Ioannis
A2 - Sannella, Donald
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018
Y2 - 9 July 2018 through 13 July 2018
ER -