Abstract
The fundamental problems of sorting and searching, traditionally studied in the unit-cost comparison model, have been generalized to include priced information, where different pairs of items have different comparison costs. These costs can be arbitrary (Charikar et al. STOC 2000), structured (Gupta et al. FOCS 2001), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the comparison cost depends on the sizes of the records, we consider the problems of sorting and batched predecessor where two non-uniform sets of items A and B are given as input. In the RAM model, pairwise comparisons (A-A, A-B and B-B) have respective comparison costs a, b and c. We give upper and lower bounds for the case \(a \le b \le c\), which serves as a warmup for the generalization to the external-memory model. In the Disk-Access Model (DAM), where transferring elements between disk and RAM is the main bottleneck, we consider the scenario where elements in B are larger than elements in A. All items are required in their entirety for comparisons in RAM. A key observation is that the complexity of sorting depends on the interleaving of the small and large items in the final sorted order, and with a high degree of interleaving, the lower bound is dominated by an associated batched predecessor problem. We give output-sensitive bounds on the batched predecessor and sorting; our bounds are tight in most cases. Our lower bounds require novel generalizations of lower bound techniques in external memory to accommodate non-uniform keys.
This work was supported in part by NSF grants CCF-1725543, CSR-1763680, CCF-1716252, CCF-1617618, CNS-1938709, and by Sandia National Laboratories.
Supported by NSF grants CRII-1755791 and CCF-1910873.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
DAM model can represent any two levels of memory, which is related to the record size in our problem. If the two levels are the disk and the main memory, elements could be larger than B but are much smaller than M. If the levels are cache and main memory, then the elements could have a length that is a nontrivial fraction of M.
- 2.
We overload notation for convenience of presentation. We assume \(w \ge B\) also for the convenience of presentation. Our bounds hold for any \(1 < w \le M/2\) and are presented in the full version of the paper.
- 3.
The proof of the lower bound for sorting N unit-sized keys in [15] proceeds in the following fashion: assuming that all blocks are sorted (using a linear scan costing N/B I/Os), there are \(N!/(B!)^{(N/B)}\) permutations required to achieve, and the transfer of a block of B sorted elements into the main memory containing \(M-B\) sorted elements reduce the number of permutations by at most a factor \({M \atopwithdelims ()B}\) (the “fan-out,” since this is the degree of the node in the decision tree). Standard algebra gives a lower bound of \(\Omega (\frac{N}{B} \log _{M/B} \frac{N}{B})\).
References
Aggarwal, A., Vitter, J.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)
Alon, N., Blum, M., Fiat, A., Kannan, S., Naor, M., Ostrovsky, R.: Matching nuts and bolts. In: Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 690–696 (1994)
Angelov, S., Kunal, K., McGregor, A.: Sorting and selection with random costs. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 48–59. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78773-0_5
Arge, L.: The buffer tree: a technique for designing batched external data structures. Algorithmica 37(1), 1–24 (2003)
Arge, L., Ferragina, P., Grossi, R., Vitter, J.: On sorting strings in external memory (extended abstract). In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC, pp. 540–548 (1997)
Arge, L., Knudsen, M., Larsen, K.: A general lower bound on the I/O-complexity of comparison-based algorithms. In: Proceedings of the 3rd Workshop on Algorithms and Data Structures, WADS, pp. 83–94 (1993)
Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., Vitter, J.: Theory and practice of I/O-efficient algorithms for multidimensional batched searching problems. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA (1998)
Bender, M.A., Hu, H., Kuszmaul, B.C.: Performance guarantees for B-trees with different-sized atomic keys. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 305–316 (2010)
Bender, M.A., Farach-Colton, M., Goswami, M., Medjedovic, D., Montes, P., Tsai, M.-T.: The batched predecessor problem in external memory. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 112–124. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44777-2_10
Berkeley DB C API Reference. http://www.berkeleydb.com/
Charikar, M., Fagin, R., Guruswami, V., Kleinberg, J.P., Raghavan, A.S.: Query strategies for priced information. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC, pp. 582–591 (2000)
Cicalese, F., Laber, E.: A new strategy for querying priced information. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, STOC, pp. 674–683 (2005)
Diehrand, G., Faaland, B.: Optimal pagination of B-trees with variable-length items. Commun. ACM 27(3), 241–247 (1984)
Elmasry, A.: Distribution-sensitive set multi-partitioning. In: 1st International Conference on the Analysis of Algorithms (2005)
Erickson, J.: Lower bounds for external algebraic decision trees. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 755–761 (2005)
Gupta, A., Kumar, A.: Sorting and selection with structured costs. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS, pp. 416–425. IEEE (2001)
Larmore, L., Hirschberg, D.: Efficient optimal pagination of scrolls. Commun. ACM 28(8), 854–856 (1985)
McCreight, E.: Pagination of B*-trees with variable-length records. Commun. ACM 20(9), 670–674 (1977)
Munro, J., Spira, P.: Sorting and searching in multisets. SIAM J. Comput. 5(1), 1–8 (1976)
Pinchuk, A.P., Shvachko, K.V.: Maintaining dictionaries: space-saving modifications of B-trees. In: Biskup, J., Hull, R. (eds.) ICDT 1992. LNCS, vol. 646, pp. 421–435. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-56039-4_57
The GNU C Library: qsort. http://www.gnu.org/software/libc/manual/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bender, M.A., Goswami, M., Medjedovic, D., Montes, P., Tsichlas, K. (2020). Batched Predecessor and Sorting with Size-Priced Information in External Memory. In: Kohayakawa, Y., Miyazawa, F.K. (eds) LATIN 2020: Theoretical Informatics. LATIN 2021. Lecture Notes in Computer Science(), vol 12118. Springer, Cham. https://doi.org/10.1007/978-3-030-61792-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-61792-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61791-2
Online ISBN: 978-3-030-61792-9
eBook Packages: Computer ScienceComputer Science (R0)