Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Implementing ranking strategies using text signatures

Published: 01 January 1988 Publication History
  • Get Citation Alerts
  • Abstract

    Signature files provide an efficient access method for text in documents, but retrieval is usually limited to finding documents that contain a specified Boolean pattern of words. Effective retrieval requires that documents with similar meanings be found through a process of plausible inference. The simplest way of implementing this retrieval process is to rank documents in order of their probability of relevance. In this paper techniques are described for implementing probabilistic ranking strategies with sequential and bit-sliced signature tiles and the limitations of these implementations with regard to their effectiveness are pointed out. A detailed comparison is made between signature-based ranking techniques and ranking using term-based document representatives and inverted files. The comparison shows that term-based representations are at least competitive (in terms of efficiency) with signature files and, in some situations, superior.

    References

    [1]
    BELKIN, N. J., AND CROFT, W. B. Retrieval techniques. Annual Review of Information Science and Technology, M. E. Williams, Ed. Elsevier Science Publishers 22, New York, 1987, pp. 110-145.
    [2]
    BERTINO, E., GIBBS, S., RABITTI, F., THANOS, C., AND TSICHRITZIS, D. A multimedia document server. In Proceedings of the Advanced Database Symposium (Japan, Aug. 29-30), 1986.
    [3]
    BUCKLEY, C., AND LEWIT, A.F. Optimization of inverted vector searches. In Proceedings of the 8th International ACM SIGIR Conference (Montreal, Canada, June 5-7). ACM New York, 1985, pp. 97-110.
    [4]
    CHRISTODOULAKIS, S. AND FALOUTSOS, C. Design considerations for a message file server. IEEE Trans. Softw. Eng. SE-IO (1984), 201-210.
    [5]
    CROFT, W.B. Document representation in probabilistic models of information retrieval. J. Am. Soc. Inf. Sci. 32 (1981), 451-457.
    [6]
    CROFT, W.B. Experiments with automatic text filing and retrieval in the office environment. SIGIR Forum (ACM) 16, 1 (1982), 2-9.
    [7]
    CROFT, W.B. Experiments with representation in a document retrieval system. Inf. Tech. 2 (1983), 1-21.
    [8]
    CROFT, W.B. A comparison of the cosine correlation and the modified probabilistic model. Inf. Tech. 2 (1984), 113-114.
    [9]
    CROFT, W.B. Boolean queries and term dependencies in probabilistic retrieval models. J. Am. Soc. Inf. Sci. 37 (1986), 71-77.
    [10]
    CROFT, W. B., AND PARENTY, T.J. A comparison of a network structure and a database system used for document retrieval. Inf. Syst. 10 (1985), 377-390.
    [11]
    CROFT, W. B., AND KROVETZ, R. Interactive retrieval of office documents. In Proceedings of the A CM Conference on Office Information Systems. To appear.
    [12]
    FAGAN, J. Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (1987). ACM, New York, 91-101.
    [13]
    FALOUTSOS, C., AND CHRISTODOULAKIS, S. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Off. Inf. Syst. 2, 4 (Oct. 1984), 267-288.
    [14]
    FALOUTSOS, C., AND CHRISTODOULAK|S, S. Description and performance analysis of signature file methods for office filing. ACM Trans. Off. Inf. Syst. 5, 3 (July 1987), 237-257.
    [15]
    HARDING, A. F., LYNCH, M. F., AND WILLETT, P. Document retrieval using a serial bit string search. Inf. Process. Manage. 19 (1983), 1-8.
    [16]
    HARPER, D.J. Relevance feedback in document retrieval systems: An evaluation of probabilistic strategies. Ph.D. dissertation, Computer Laboratory, Cambridge Univ., Cambridge, England, 1980.
    [17]
    HEAPS, H.S. Information Retrieval: Computational and Theoretical Aspects. Academic Press, New York, 1978.
    [18]
    PORTER, M.F. An algorithm for suffix stripping. In New models in probabilistic information retrieval. British Library Research and Development Report 5587, Cambridge Univ., Cambridge, England, 1980.
    [19]
    RABlZ'rl, F., AND ZIZKA, J. Evaluation of access methods to text documents in office systems. In Proceedings of the 3rd Joint A CM-BCS Conference on Research and Development in Information Retrieval. Cambridge University Press, New York, 1984, pp. 21-40.
    [20]
    ROBERTS, C.S. Partial match retrieval via the method of superimposed codes. In Proceedings IEEE 67 (1979), 1624-1642.
    [21]
    ROBERTSON, S.E. The probability ranking principle in IR. J. Doc. 33 (1977), 294-304.
    [22]
    SACKS-DAVIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8 (1983), 273-280.
    [23]
    SALTON, G. Automatic Information Organization and Retrieval. McGraw-Hill, New York, 1968.
    [24]
    SALTON, G., FOX, E. A., AND WU, H. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov. 1983) 1022-1036.
    [25]
    SMEATON, A.F. Incorporating syntactic information into a document retrieval strategy: An Investigation. In Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval (Pisa, Italy). ACM, New York, 1986, pp. 103-113.
    [26]
    SMEATON, A. F., AND VAN RIJSBERGEN, C.J. The nearest neighbor problem in information retrieval. In Proceedings of the 4th International Conference on Information Storage and Retrieval (Oakland, Calif., May 31-June 2). ACM, New York, 1986, pp. 83-87.
    [27]
    SPARCK JONES, K. Automatic indexing. J. Doc. 30 {1974), 393-432.
    [28]
    SPARCK JONES, K. AND BATES, R. C,. Research on automatic indexing. British Library Research and Development Rep. 5464, Computer Laboratory, Cambridge Univ., Cambridge, England, 1977.
    [29]
    STANFILL, C., AND KAHLE, B. Parallel free-text search on the connection machine system. Commun. ACM 29 (1986), 1229-1239.
    [30]
    STONE, S. S. Parallel querying of large databases: A case study. IEEE Comput. 20 (1987), 11-22.
    [31]
    TSICHRITZ{S, D., AND CHRISTODOULAKIS, S. Message files. ACM Trans. Off. inf. Syst. 1, 1 (Jan. 1983), 88-98.
    [32]
    VAN RIJSBERGEN, C.J. Information Retrieval, 2nd edition. Butterworths, London, 1979.
    [33]
    VAN RIJSBERGEN, C.J. A non-classical logic for information retrieval. Comput. J. 29 (1986), 481-485.

    Cited By

    View all

    Recommendations

    Reviews

    Jane B. Grimson

    The dramatic growth of office information systems in recent years has given a tremendous boost to research in information retrieval. This paper addresses the important issue of providing efficient and effective methods for the storage and retrieval of text and document files. Its particular focus is on signature files. Signature files aim to improve the efficiency, rather than the effectiveness, of retrieval from text files. The addition of ranking strategies improves effectiveness but imposes a storage and efficiency penalty. The authors examine the trade-offs and evaluate a number of different techniques for the implementation of probabilistic ranking strategies with both sequential and bit-sliced signature files. They compare these strategies to ranking techniques that use term-based document representatives and inverted files. They conclude that little choice exists between term-based and signature versions for sequential organizations. If an inverted or bit-sliced organization is selected to improve efficiency, then the term-based file would give significantly better performance at the cost of increased storage. This is a well-written and -presented paper. For those not familiar with the field, the terms used are all clearly explained. The authors draw on results from the Esprit MULTOS project, which has developed a multimedia document server. Such a background ensures that the authors focus on the practical issues rather than on the underlying mathematical theory and results in a very readable paper.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 6, Issue 1
    Jan. 1988
    81 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/42279
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 1988
    Published in TOIS Volume 6, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Fast Forward Index Methods for Pseudo-Relevance Feedback RetrievalACM Transactions on Information Systems10.1145/274419933:4(1-33)Online publication date: 13-May-2015
    • (2010)Time and Space Efficiencies Analysis of Full-Text Index TechniquesJournal of Software10.3724/SP.J.1001.2009.0350020:7(1768-1784)Online publication date: 10-Mar-2010
    • (2006)Inverted files for text search enginesACM Computing Surveys10.1145/1132956.113295938:2(6-es)Online publication date: 25-Jul-2006
    • (2006)Performance of query processing implementations in ranking-based text retrieval systems using inverted indicesInformation Processing and Management: an International Journal10.1016/j.ipm.2005.06.00442:4(875-898)Online publication date: 1-Jul-2006
    • (2006)Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systemsProceedings of the 21st international conference on Computer and Information Sciences10.1007/11902140_75(717-725)Online publication date: 1-Nov-2006
    • (2005)Fast indexing and searching strategies for feature-based image database systemsJournal of Electronic Imaging10.1117/1.186614814:1(013019)Online publication date: 1-Jan-2005
    • (2005)Comparing inverted files and signature files for searching a large lexiconInformation Processing and Management: an International Journal10.1016/j.ipm.2003.12.00341:3(613-633)Online publication date: 1-May-2005
    • (2005)Intelligent Internet Information Systems in Knowledge Acquisition: Techniques and ApplicationsIntelligent Knowledge-Based Systems10.1007/978-1-4020-7829-3_5(110-139)Online publication date: 2005
    • (2005)A backend text retrieval machine for signature-based document rankingAdvances in Computing and Information — ICCI '9110.1007/3-540-54029-6_177(288-297)Online publication date: 1-Jun-2005
    • (2005)Partial document ranking by heuristic methodsAdvances in Computing and Information — ICCI '9110.1007/3-540-54029-6_172(231-239)Online publication date: 1-Jun-2005
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media