Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques

Published: 01 September 2003 Publication History

Abstract

Ordered sets of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents hence need to be extended to these ordered sets. This is done in this paper using fuzzy set techniques. First a general similarity measure is developed which contains the classical strong similarity measures such as Jaccard, Dice, Cosine and which contains the classical weak similarity measures such as Recall and Precision.Then these measures are extended to comparing fuzzy sets of documents. Measuring the similarity for ordered sets of documents is a special case of this, where, the higher the rank of a document, the lower its weight is in the fuzzy set. Concrete forms of these similarity measures are presented. All these measures are new and the ones for the weak similarity measures are the first of this kind (other strong similarity measures have been given in a previous paper by Egghe and Michel).Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT© extracts ranked documents sets in three different contexts, each for 600 request. The practical useability of the OS-measures is then discussed based on these experiments.

References

[1]
Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1995). Measurement in information science. New York: Academic Press.
[2]
Buell, D. A., & Kraft, D. H. (1981a). Evaluation of fuzzy retrieval systems. In Proceedings of the American society for information science (pp. 298-300).
[3]
Buell, D. A., & Kraft, D. H. (1981b). Performance measurement in a fuzzy retrieval environment. ACM SIGIR Forum 16(1). In Proceedings of the fourth international conference on information storage and retrieval, Oakland, California, May 31-June 2, 1981 (pp. 56-61).
[4]
Egghe, L. (1994). A theory of continuous rates and applications to the theory of growth and obsolescence rates. Information Processing and Management, 30(2), 279-292.
[5]
Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing and Management, 38(6), 823-848.
[6]
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quantitative methods in library, documentation and information science. Amsterdam: Elsevier.
[7]
Fluhr, C. (1997). SPIRIT.W3: A distributed Cross Lingual Indexing and Search Engine. In: Proceedings of the INET 97: The seventh annual conference of the internet society, Kuala Lumpur, Malaysia, June 24-27, 1997.
[8]
Grossman, D. A., & Frieder, O. (1998). Information retrieval. Algorithms and heuristics. Boston: Kluwer Academic Publishers.
[9]
Lainé-Cruzel, S., Lafouge, T., Lardy, J. P., & Abdallah, N. (1996). Improving information retrieval by combining user profile and document segmentation. Information Processing and Management, 32(3), 305-315.
[10]
Losee, R. M. (1998). Text retrieval and filtering. Analytic models of performance. Boston: Kluwer Academic Publishers.
[11]
Michel, C. (2000). Ordered similarity measures taking into account the rank of documents. Information Processing and Management, 37(4), 603-622.
[12]
Salton, G., & Mc Gill, M. J. (1987). Introduction to modern information retrieval. New York: Mc Graw-Hill.
[13]
Tague-Sutcliffe, J. (1995). Measuring information. An information services perspective. New York: Academic Press.
[14]
van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London: Butterworths.
[15]
Zadeh, L. (1975). Fuzzy sets and their applications to cognitive and decision processes. New York: Academic Press.

Cited By

View all
  • (2023)Semantic domain comparison of research keywords by indicator-based fuzzy distancesInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10346860:5Online publication date: 1-Sep-2023
  • (2011)Solving multi-label text categorization problem using support vector machine approach with membership functionNeurocomputing10.1016/j.neucom.2011.07.00174:17(3682-3689)Online publication date: 1-Oct-2011
  • (2008)Personalized information retrieval based on context and ontological knowledgeThe Knowledge Engineering Review10.1017/S026988890700128223:1(73-100)Online publication date: 1-Mar-2008
  • Show More Cited By

Index Terms

  1. Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Processing and Management: an International Journal
    Information Processing and Management: an International Journal  Volume 39, Issue 5
    September 2003
    137 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 September 2003

    Author Tags

    1. fuzzy
    2. ordered set
    3. similarity measure

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Semantic domain comparison of research keywords by indicator-based fuzzy distancesInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10346860:5Online publication date: 1-Sep-2023
    • (2011)Solving multi-label text categorization problem using support vector machine approach with membership functionNeurocomputing10.1016/j.neucom.2011.07.00174:17(3682-3689)Online publication date: 1-Oct-2011
    • (2008)Personalized information retrieval based on context and ontological knowledgeThe Knowledge Engineering Review10.1017/S026988890700128223:1(73-100)Online publication date: 1-Mar-2008
    • (2006)Neural network approach for learning of the world structure by cognitive agentsProceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III10.1007/11893011_128(1012-1019)Online publication date: 9-Oct-2006
    • (2006)Flexible method for a distance measure between communicative agents’ stored perceptionsProceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II10.1007/11893004_29(227-234)Online publication date: 9-Oct-2006

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media