Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Batched Predecessor and Sorting with Size-Priced Information in External Memory

  • Conference paper
  • First Online:
LATIN 2020: Theoretical Informatics (LATIN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12118))

Included in the following conference series:

  • 622 Accesses

Abstract

The fundamental problems of sorting and searching, traditionally studied in the unit-cost comparison model, have been generalized to include priced information, where different pairs of items have different comparison costs. These costs can be arbitrary (Charikar et al. STOC 2000), structured (Gupta et al. FOCS 2001), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the comparison cost depends on the sizes of the records, we consider the problems of sorting and batched predecessor where two non-uniform sets of items A and B are given as input. In the RAM model, pairwise comparisons (A-A, A-B and B-B) have respective comparison costs a, b and c. We give upper and lower bounds for the case \(a \le b \le c\), which serves as a warmup for the generalization to the external-memory model. In the Disk-Access Model (DAM), where transferring elements between disk and RAM is the main bottleneck, we consider the scenario where elements in B are larger than elements in A. All items are required in their entirety for comparisons in RAM. A key observation is that the complexity of sorting depends on the interleaving of the small and large items in the final sorted order, and with a high degree of interleaving, the lower bound is dominated by an associated batched predecessor problem. We give output-sensitive bounds on the batched predecessor and sorting; our bounds are tight in most cases. Our lower bounds require novel generalizations of lower bound techniques in external memory to accommodate non-uniform keys.

This work was supported in part by NSF grants CCF-1725543, CSR-1763680, CCF-1716252, CCF-1617618, CNS-1938709, and by Sandia National Laboratories.

Supported by NSF grants CRII-1755791 and CCF-1910873.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    DAM model can represent any two levels of memory, which is related to the record size in our problem. If the two levels are the disk and the main memory, elements could be larger than B but are much smaller than M. If the levels are cache and main memory, then the elements could have a length that is a nontrivial fraction of M.

  2. 2.

    We overload notation for convenience of presentation. We assume \(w \ge B\) also for the convenience of presentation. Our bounds hold for any \(1 < w \le M/2\) and are presented in the full version of the paper.

  3. 3.

    The proof of the lower bound for sorting N unit-sized keys in [15] proceeds in the following fashion: assuming that all blocks are sorted (using a linear scan costing N/B I/Os), there are \(N!/(B!)^{(N/B)}\) permutations required to achieve, and the transfer of a block of B sorted elements into the main memory containing \(M-B\) sorted elements reduce the number of permutations by at most a factor \({M \atopwithdelims ()B}\) (the “fan-out,” since this is the degree of the node in the decision tree). Standard algebra gives a lower bound of \(\Omega (\frac{N}{B} \log _{M/B} \frac{N}{B})\).

References

  1. Aggarwal, A., Vitter, J.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. Alon, N., Blum, M., Fiat, A., Kannan, S., Naor, M., Ostrovsky, R.: Matching nuts and bolts. In: Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 690–696 (1994)

    Google Scholar 

  3. Angelov, S., Kunal, K., McGregor, A.: Sorting and selection with random costs. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 48–59. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78773-0_5

    Chapter  Google Scholar 

  4. Arge, L.: The buffer tree: a technique for designing batched external data structures. Algorithmica 37(1), 1–24 (2003)

    Article  MathSciNet  Google Scholar 

  5. Arge, L., Ferragina, P., Grossi, R., Vitter, J.: On sorting strings in external memory (extended abstract). In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC, pp. 540–548 (1997)

    Google Scholar 

  6. Arge, L., Knudsen, M., Larsen, K.: A general lower bound on the I/O-complexity of comparison-based algorithms. In: Proceedings of the 3rd Workshop on Algorithms and Data Structures, WADS, pp. 83–94 (1993)

    Google Scholar 

  7. Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., Vitter, J.: Theory and practice of I/O-efficient algorithms for multidimensional batched searching problems. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA (1998)

    Google Scholar 

  8. Bender, M.A., Hu, H., Kuszmaul, B.C.: Performance guarantees for B-trees with different-sized atomic keys. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 305–316 (2010)

    Google Scholar 

  9. Bender, M.A., Farach-Colton, M., Goswami, M., Medjedovic, D., Montes, P., Tsai, M.-T.: The batched predecessor problem in external memory. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 112–124. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44777-2_10

    Chapter  Google Scholar 

  10. Berkeley DB C API Reference. http://www.berkeleydb.com/

  11. Charikar, M., Fagin, R., Guruswami, V., Kleinberg, J.P., Raghavan, A.S.: Query strategies for priced information. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC, pp. 582–591 (2000)

    Google Scholar 

  12. Cicalese, F., Laber, E.: A new strategy for querying priced information. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, STOC, pp. 674–683 (2005)

    Google Scholar 

  13. Diehrand, G., Faaland, B.: Optimal pagination of B-trees with variable-length items. Commun. ACM 27(3), 241–247 (1984)

    Article  MathSciNet  Google Scholar 

  14. Elmasry, A.: Distribution-sensitive set multi-partitioning. In: 1st International Conference on the Analysis of Algorithms (2005)

    Google Scholar 

  15. Erickson, J.: Lower bounds for external algebraic decision trees. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 755–761 (2005)

    Google Scholar 

  16. Gupta, A., Kumar, A.: Sorting and selection with structured costs. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS, pp. 416–425. IEEE (2001)

    Google Scholar 

  17. Larmore, L., Hirschberg, D.: Efficient optimal pagination of scrolls. Commun. ACM 28(8), 854–856 (1985)

    Article  Google Scholar 

  18. McCreight, E.: Pagination of B*-trees with variable-length records. Commun. ACM 20(9), 670–674 (1977)

    Article  Google Scholar 

  19. Munro, J., Spira, P.: Sorting and searching in multisets. SIAM J. Comput. 5(1), 1–8 (1976)

    Article  MathSciNet  Google Scholar 

  20. Pinchuk, A.P., Shvachko, K.V.: Maintaining dictionaries: space-saving modifications of B-trees. In: Biskup, J., Hull, R. (eds.) ICDT 1992. LNCS, vol. 646, pp. 421–435. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-56039-4_57

    Chapter  Google Scholar 

  21. The GNU C Library: qsort. http://www.gnu.org/software/libc/manual/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mayank Goswami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bender, M.A., Goswami, M., Medjedovic, D., Montes, P., Tsichlas, K. (2020). Batched Predecessor and Sorting with Size-Priced Information in External Memory. In: Kohayakawa, Y., Miyazawa, F.K. (eds) LATIN 2020: Theoretical Informatics. LATIN 2021. Lecture Notes in Computer Science(), vol 12118. Springer, Cham. https://doi.org/10.1007/978-3-030-61792-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61792-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61791-2

  • Online ISBN: 978-3-030-61792-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics