TY - JOUR
T1 - Resource oblivious sorting on multicores
AU - Cole, Richard
AU - Ramachandran, Vijaya
N1 - Funding Information:
A preliminary version of this article appeared in Proc. International Colloquium of Automata, Languages and Programming (ICALP), Track A, Springer LNCS Volume 6198, pp. 226–237, 2010. The work of the first author was supported in part by NSF Grants CCF-0830516, CCF-1217989, and CCF-1527568. The work of the second author was supported in part by NSF Grants CCF-0830737 and CCF-1320675. Authors’ addresses: R. Cole, Computer Science Dept., Courant Institute of Mathematical Sciences, NYU, New York, NY 10012; email: cole@cs.nyu.edu. V. Ramachandran, Dept. of Computer Science, University of Texas, Austin, TX 78712; email: vlr@cs.utexas.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. ©c 2017 ACM 2329-4949/2017/03-ART23 $15.00 DOI: http://dx.doi.org/10.1145/3040221
Publisher Copyright:
© 2017 ACM.
PY - 2017/3
Y1 - 2017/3
N2 - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.
AB - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.
KW - Cache oblivious
KW - Sample sort
KW - Sorting
KW - merge sort
UR - http://www.scopus.com/inward/record.url?scp=85054897853&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054897853&partnerID=8YFLogxK
U2 - 10.1145/3040221
DO - 10.1145/3040221
M3 - Article
AN - SCOPUS:85054897853
VL - 3
JO - ACM Transactions on Parallel Computing
JF - ACM Transactions on Parallel Computing
SN - 2329-4949
IS - 4
M1 - a23
ER -