Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2594538.2594557acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Categorical range maxima queries

Published: 18 June 2014 Publication History

Abstract

Given an array A[1...n] of n distinct elements from the set {1, 2, ..., n} a range maximum query RMQ(a, b) returns the highest element in A[a...b] along with its position. In this paper, we study a generalization of this classical problem called Categorical Range Maxima Query (CRMQ) problem, in which each element A[i] in the array has an associated category (color) given by C[i] ∈ [σ]. A query then asks to report each distinct color c appearing in C[a...b] along with the highest element (and its position) in A[a...b] with color c. Let pc denote the position of the highest element in A[a...b] with color c. We investigate two variants of this problem: a threshold version and a top-k version. In threshold version, we only need to output the colors with A[pc] more than the input threshold τ, whereas top-k variant asks for k colors with the highest A[pc] values. In the word RAM model, we achieve linear space structure along with O(k) query time, that can report colors in sorted order of A[•]. In external memory, we present a data structure that answers queries in optimal O(1+k/B) I/O's using almost-linear O(n log* n) space, as well as a linear space data structure with O(log* n + k/B) query I/Os. Here k represents the output size, log* n is the iterated logarithm of n and B is the block size. CRMQ has applications to document retrieval and categorical range reporting -- giving a one-shot framework to obtain improved results in both these problems. Our results for CRMQ not only improve the existing best known results for three-sided categorical range reporting but also overcome the hurdle of maintaining color uniqueness in the output set.

References

[1]
P. Afshani. On dominance reporting in 3d. In ESA, pages 41--51, 2008.
[2]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1998.
[3]
L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and optimal range search indexing. In PODS, pages 346--357, 1999.
[4]
M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms, 57(2):75--94, 2005.
[5]
O. Berkman and U. Vishkin. Recursive star-tree parallel data structure. SICOMP, 22(2):221--242, 1993.
[6]
P. Bozanis, N. Kitsios, C. Makris, and A. K. Tsakalidis. New upper bounds for generalized intersection searching problems. In ICALP, pages 464--474, 1995.
[7]
G. S. Brodal, R. Fagerberg, M. Greve, and A. López-Ortiz. Online sorted range reporting. In ISAAC, pages 173--182, 2009.
[8]
B. Chazelle and H. Edelsbrunner. Linear space data structures for two types of range search. DCG, 2:113--126, 1987.
[9]
P. Ferragina and R. Grossi. The String B-tree: A new data structure for string searching in external memory and its application. JACM, 46(2):236--280, 1999.
[10]
J. Fischer and V. Heun. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In ESCAPE, pages 459--470, 2007.
[11]
M. L. Fredman and D. E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci., 48(3):533--551, 1994.
[12]
P. Gupta, R. Janardan, and M. H. M. Smid. Further results on generalized intersection searching problems: counting, reporting, and dynamization. J. Algorithms, 19(2):282--317, 1995.
[13]
W.-K. Hon, R. Shah, and J. S. Vitter. Space-efficient framework for top-k string retrieval problems. In FOCS, pages 713--722, 2009.
[14]
R. Janardan and M. A. Lopez. Generalized intersection searching problems. IJCGA, 3(1):39--69, 1993.
[15]
H. Kaplan, N. Rubin, M. Sharir, and E. Verbin. Efficient colored orthogonal range counting. SICOMP, 38(3):982--1011, 2008.
[16]
M. Karpinski and Y. Nekrich. Top-k color queries for document retrieval. In SODA, pages 401--411, 2011.
[17]
K. G. Larsen and R. Pagh. I/O-efficient data structures for colored range and prefix reporting. In SODA, pages 583--592, 2012.
[18]
K. G. Larsen and F. van Walderveen. Near-optimal range reporting structures for categorical data. In SODA, pages 256--276, 2013.
[19]
C. Makris and A. K. Tsakalidis. Algorithms for three-dimensional dominance searching in linear space. IPL, 66(6):277--283, 1998.
[20]
S. Muthukrishnan. Efficient algorithms for document retrieval problems. In SODA, pages 657--666, 2002.
[21]
G. Navarro. Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. In CoRR abs/304.6023, 2013.
[22]
G. Navarro and Y. Nekrich. Top-k document retrieval in optimal time and linear space. In SODA, 2012.
[23]
Y. Nekrich. Space-efficient range reporting for categorical data. In PODS, pages 113--120, 2012.
[24]
R. Raman, V. Raman, and S. S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. TALG, 2007.
[25]
R. Shah, C. Sheng, S. V. Thankachan, and J. S. Vitter. Top-k document retrieval in external memory. In ESA, 2013.
[26]
J. F. Sibeyn. External selection. In STACS, pages 291--301, 1999.
[27]
J. S. Vitter. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4):305--474, 2008.
[28]
P. Weiner. Linear pattern matching algorithms. In SWAT, pages 1--11, 1973.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2014
300 pages
ISBN:9781450323758
DOI:10.1145/2594538
  • General Chair:
  • Richard Hull,
  • Program Chair:
  • Martin Grohe
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O efficiency
  2. categorical queries

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

PODS '14 Paper Acceptance Rate 22 of 67 submissions, 33%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Ranked Document Retrieval in External MemoryACM Transactions on Algorithms10.1145/355976319:1(1-12)Online publication date: 9-Mar-2023
  • (2022)Generic Techniques for Building Top-k StructuresACM Transactions on Algorithms10.1145/354607418:4(1-23)Online publication date: 10-Oct-2022
  • (2019)A Guide to Designing Top-k IndexesACM SIGMOD Record10.1145/3377330.337733248:2(6-17)Online publication date: 19-Dec-2019
  • (2018)A Linear-Space Data Structure for Range-LCP Queries in Poly-Logarithmic TimeComputing and Combinatorics10.1007/978-3-319-94776-1_51(615-625)Online publication date: 29-Jun-2018
  • (2016)Efficient Top-k Indexing via General ReductionsProceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/2902251.2902290(277-288)Online publication date: 15-Jun-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media