Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Declustering of key-based partitioned signature files

Published: 01 September 1996 Publication History

Abstract

Access methods based on signature files can largely benefit from possibilities offered by parallel environments. To this end, an effective declustering strategy that would distribute signatures over a set of parallel independent disks has to be combined with a synergic clustering which is employed to avoid searching the whole signature file while executing a query. This article proposes two parallel signature file organizations, Hamming Filter (HF) and Hamming + Filter (H+F), whose common declustering strategy is based on error correcting codes, and where clustering is achieved by organizing signatures into fixed-size buckets, each containing signatures sharing the same key value. HF allocates signatures on disks in a static way and works well if a correct relationship holds between the parameters of the code and the size of the file. H+F is a generalization of HF suitable to manage highly dynamic files. It uses a dynamic declustering, obtained through a sequence of codes, and organizes a smooth migration of signatures between disks so that high performance levels are retained regardless of current file size. Theoretical analysis characterizes the best-case, expected, and worst-case behaviors of these organizations. Analytical results are verified by experiments on prototype systems.

References

[1]
ABI)EI.-GHAFFAR, K. A. AND EL ABBADI, A. 1993. Optimal disk allocation for partial match queries. ACM Trans. Database Syst. 18, 1 !March), 132-156.
[2]
Ant~aA, S. R. AND ROBERTS, C.S. 1980. An associative parallel processor fi~r partial match retrieval using superimposed codes. In Proceedings of the Seventh Annual Symposium on Computer Architecture (France, May~, IEEE, 218-227.
[3]
CI^C~IA, P. AND ZgZm.A, P. 1993. Estimating accesses in partitioned signature file organizations. ACM Trans. Infi Syst. II, 2 (Apri}), 133-142.
[4]
DFpPIS~~, U. 1986. S-tree: A dynamic ba}anced signature index for office retrieval. In Proceedingx of thc Nlnth ACM SIG1R International Conference on Research and Developmets! in ln/brmatio, Retrieval fPisa, Ita}y, Sept.), 77-87.
[5]
D~:WITT, D. AND GR~-W, J. 1992. Para}le} database systems: The future of high performance database systems. (',nnmu~z. ACM 35, 6 (dune), 85-98.
[6]
D~, H. C.~~r~ S(m~~l~:wsKh J. S. 1982. Disk allocation for Cartesian product files on multiple disk systems. ACM Trans. Database Syst. 7, I tMarch), 82-101.
[7]
FALO('TSOS, C. 1992. Signature files. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Englewood Cliffs, NJ, Chapter 4, 44-65.
[8]
F^l,oIwsos, C. AND CHRISTODOULAKIS, S. 1987. Optimal signature extraction and information loss~ ACM Trans. Database Syst. 12, 3 (Sept. l, 395-428.
[9]
FAL()tTTSOS, C. AND CHR1STOI)OULAKIS, S. 1984. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Office Infi Syst. 2, 4 ~Oct.), 267 288.
[10]
FALOt5TSOS, C. AND METAXAS, D. 1991. Disk allocation methods using error correcting codes. IEEE Trans. Comput. 40, 8 (Aug.), 907-914.
[11]
Ft'.nWARA, T., ITO, M. I'~\SAMI, T., KATAOKA, M., AND OKtZl, O. 1991. Performance analysis of disk allocation method using error-correcting codes. IEEE Trans. lnfi TheoO' 37, '2 (March~, 379-384.
[12]
GRANI)I, F., TIBERIO, P. AND ZEZULA, P. 1992. Frame-slice partitioned para}lel signature files. In Proceedings of the Fifteenth ACM SIGIR International Conference ICopenhagen, Denmark. Junel, 286-297.
[13]
LEE, D.L. 1986. A word-parallel, bit-serial signature processor for superimposed coding. In Proceedings of the Second International Conferenee on Data Engineering, (Las Angeles, CA, Feb.) IEEE, New York, 352-359.
[14]
LEE, D. L. ^ND LENG, C.-W. 1990. A partitioned signature file structure for multiattribute and text retrieval. In Proceedings of the Sixth International Conference on Data Engineering, IEEE, (Las Angeles, CA, Feb.), 389-397.
[15]
LEE, D. L. AND LENG, C.-W. 1989. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Office lnf Syst. 7, 2 (April), 158-180.
[16]
LIN, Z. 1993. Concurrent frame signature ff}es. Distrib. Parallel Databases 1, 3 (July), 231-249.
[17]
LiN, Z. AND F^LOUTSOS, C. 1992. Frame-sliced signature files. IEEE Trans. Knotvl. Data Eng. 4, 3 (June), 281-289.
[18]
MACWILLI^MS, F. J. AND SLO^NE, N. J. A. 1988. The Theorv of Error-Correcting Codes. North-Holland, Amsterdam, The Netherlands.
[19]
PETERSON, W. W. AND WELDON, E.J. 1972. Error Correcting Codes. MIT Press, Cambridge, MA.
[20]
POGUE, C. AND W{LLETT, P. 1987. Use of text signatures for document retrieval in a high parallel environment. Parallel Comput. 4, 259-268.
[21]
R()BErtTS, C.S. 1979. Partial match retrieval via the method of superimposed codes. Proc. IEEE 67, 12 (Dec.), 1624-1642.
[22]
STANF{LL, C. AND KAHLE, B. 1986. Paralle} free-text search on the connection machine system. Commun. ACM 29, 12 (Der.), 1229-1239.
[23]
SUNG, Y.Y. 1985. Parallel searching for binary Cartesian product files. In Proceedings of the 1985 ACM CSC Conference (New Or{eans, LA, March), 163-172.
[24]
TmERIO, P. AND ZEZULA, P. 1993. Selecting signature files for specif~c applications. Inf Process. Manage. 29, 4, 487-498.
[25]
VERSOrFr, T. 1987. An updated table of minimum-distance bounds for binary linear codes. IEEE Trans. Inf Theory IT-33 (Sept.), 665-680.
[26]
ZEZULA, P., CIACCIA, P., AND TIBER{O, P. 1993. Hamming filter: A dynamic signature file arganization for parallel stores. In Proceedings of the Nineteenth VLDB International Conference, (Dublin, Ireland, Aug.), 314-327.
[27]
ZEZULA, P., RABITTI, F., AND TIBER~O, P. 1991. Dynamic partitioning of signature files. ACM Trans. Inf OEvst. 9, 4 (Oct.), 336-369.

Cited By

View all
  • (2024)Study on Content‐Based Image RetrievalIntegrating Metaheuristics in Computer Vision for Real‐World Optimization Problems10.1002/9781394230952.ch15(253-272)Online publication date: 31-Jul-2024
  • (2020)ILP-Based Local Search for Graph PartitioningACM Journal of Experimental Algorithmics10.1145/339863425(1-26)Online publication date: 4-Jul-2020
  • (2020)Property Suffix Array with Applications in Indexing Weighted SequencesACM Journal of Experimental Algorithmics10.1145/338589825(1-16)Online publication date: 3-May-2020
  • Show More Cited By

Recommendations

Reviews

Jaroslav Pokorny

Declustering data on multiple disk systems has become an important research issue concerning partial match and range queries on both formatted and unformatted data. The paper proposes two parallel signature file organizations, Hamming filter and Hamming + filter. These organizations are based on parallelization of the quick filter (QF) key-based organization. Parallelization of QF can be considered as the problem of declustering buckets on disks so that, for each possible query, the I/O load is distributed uniformly among the available disks. In section 2, the authors give basic concepts related to binary linear error-correcting codes, key-based partitioned signature files, and Hamming filter (HF) organization [1]. This dynamic key-based signature file organization for multiple disks combines the dynamic clustering features of QF with a static declustering strategy based on linear error-correcting codes. Section 3 is devoted to an analysis of HF performance. Optimal, expected, and worst-case response times are derived. Further, it is shown that the performance of HF depends on the file size. Section 4 defines Hamming + filter (H + F), a new organization that guarantees high performance by automatically adjusting the static disk allocation strategy used in HF. The basic idea is to use variable-length keys not only to manage the signatures stored on each disk, but to assign signatures to disks. This requires using a sequence of error-correcting codes and one code for specific file size; it results in a smooth migration of signatures between disks, so no reorganization phase is needed. Section 5 summarizes experimental results on HF and H + F obtained by prototype systems developed by the authors. The tests focus on the quality of declustering, that is, how evenly the organization can distribute the I/O load on disks. H + F can guarantee good performance levels for even highly dynamic files, justifying the migration process that linearly reallocates signatures when the file grows or shrinks. Section 6 discusses related work and states some conclusions. The paper concludes with a number of appendices that contain the proofs of the most important theorems. The paper is written at a high level and significantly contributes to the theory of signatures. The results are also of practical importance.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 21, Issue 3
Sept. 1996
132 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/232753
  • Editor:
  • Won Kim
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996
Published in TODS Volume 21, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error correcting codes
  2. information retrieval
  3. parallel independent disks
  4. partial match queries
  5. performance evaluation
  6. superimposed coding

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)6
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Study on Content‐Based Image RetrievalIntegrating Metaheuristics in Computer Vision for Real‐World Optimization Problems10.1002/9781394230952.ch15(253-272)Online publication date: 31-Jul-2024
  • (2020)ILP-Based Local Search for Graph PartitioningACM Journal of Experimental Algorithmics10.1145/339863425(1-26)Online publication date: 4-Jul-2020
  • (2020)Property Suffix Array with Applications in Indexing Weighted SequencesACM Journal of Experimental Algorithmics10.1145/338589825(1-16)Online publication date: 3-May-2020
  • (2020)Generating Difficult CNF Instances in Unexplored Constrainedness RegionsACM Journal of Experimental Algorithmics10.1145/338565125(1-12)Online publication date: 3-May-2020
  • (2020)A Sequential Importance Sampling Algorithm for Counting Linear ExtensionsACM Journal of Experimental Algorithmics10.1145/338565025(1-14)Online publication date: 3-May-2020
  • (2018)Autonomous tools and designCommunications of the ACM10.1145/321075362:1(50-57)Online publication date: 19-Dec-2018
  • (2018)The Church-Turing thesisCommunications of the ACM10.1145/319844862:1(66-74)Online publication date: 19-Dec-2018
  • (2018)Deception, identity, and securityCommunications of the ACM10.1145/319083662:1(85-93)Online publication date: 19-Dec-2018
  • (2016)A Comparison of Sorting Times between Java 8 and Parallel ColtACM SIGSOFT Software Engineering Notes10.1145/2967307.296731641:4(1-5)Online publication date: 19-Aug-2016
  • (2016)Do Anesthesiologists Know What They Are Doing? Mining a Surgical Time-Series Database to Correlate Expert Assessment with OutcomesACM Transactions on Knowledge Discovery from Data10.1145/282289710:3(1-27)Online publication date: 9-Feb-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media