Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Toward a semantic granularity model for domain-specific information retrieval

Published: 22 July 2011 Publication History
  • Get Citation Alerts
  • Abstract

    Both similarity-based and popularity-based document ranking functions have been successfully applied to information retrieval (IR) in general. However, the dimension of semantic granularity also should be considered for effective retrieval. In this article, we propose a semantic granularity-based IR model that takes into account the three dimensions, namely similarity, popularity, and semantic granularity, to improve domain-specific search. In particular, a concept-based computational model is developed to estimate the semantic granularity of documents with reference to a domain ontology. Semantic granularity refers to the levels of semantic detail carried by an information item. The results of our benchmark experiments confirm that the proposed semantic granularity based IR model performs significantly better than the similarity-based baseline in both a bio-medical and an agricultural domain. In addition, a series of user-oriented studies reveal that the proposed document ranking functions resemble the implicit ranking functions exercised by humans. The perceived relevance of the documents delivered by the granularity-based IR system is significantly higher than that produced by a popular search engine for a number of domain-specific search tasks. To the best of our knowledge, this is the first study regarding the application of semantic granularity to enhance domain-specific IR.

    References

    [1]
    Allen, R.B. and Wu, Y. 2002. Generality of texts. In Proceedings of the 5th International Conference on Asian Digital Libraries. 111--116.
    [2]
    Aronson, A. R. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the American Medical Informatics Association Annual Symposium. 17--21.
    [3]
    Bailey, P., Craswell, N., De Vries, A. P., and Soboroff, I. 2007. Overview of the TREC 2007 enterprise track. In Proceedings of The 16th Text Retrieval Conference (TREC'07).
    [4]
    Bargiela, A. and Pedrycz, W. 2008. Toward a theory of granular computing for human-centered information processing. IEEE Trans. Fuzzy Syst. 16, 2, 320--330.
    [5]
    Beaulieu, M., Fowkes, H., Alemayehu, N., and Sanderon, M. 1999. Interactive OKAPI at Sheffield TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 17--19.
    [6]
    Belkin, N. J., Perez Carballo, J., Cool, C., Kelly, D., Lin, S., Park, S. Y., Rieh, S. Y., Savage-Knepshield, P., and Sikora, C. 1998. Rutgers' TREC-7 interactive track experience. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 275--283.
    [7]
    Belkin, N. J., Head, J., Jeng, J., Kelly, D., Lin, S., Park, S. Y., Cool, C., Savage-Knepshield, P., and Sikora, C. 1999. Relevance feedback versus local context analysis as term suggestion devices: Rutgers' TREC-8 interactive track experience. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 565--574.
    [8]
    Bhatia N., Shah N. H., Rubin D. L., Chiang A. P., and Musen M. A. 2009. Comparing concept recognizers for ontology-based indexing: MGREP vs. MetaMap. In Proceedings of the AMIA Summit on Translational Bioinformatics.
    [9]
    Bodner, R. C. and Chignell, M. H. 1998. CLICKIR: Text retrieval using a dynamic hypertext interface. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 573--582.
    [10]
    Buyukkokten, O., Kaljuvee, O., Garcia-Molina, H., Paeppcke, A., and Winograd, T. 2002. Efficient Web browsing on handheld devices using page and form Summarization. ACM Trans. Inform. Syst. 20, 1, 82--115.
    [11]
    Carbonell, J. and Goldstain, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 335--336.
    [12]
    Dumais, S., Platt, J., Heckerman, D., and Sahami, M. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management. 148--155.
    [13]
    Fonseca. F., Egenhofer, M., Davis, C., and Camara, G. 2002. Semantic granularity in ontology-driven geographic information systems. Annals Math. Artifi. Intell. 36, 1--2, 121--151.
    [14]
    Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psych. Bull. 76, 5, 378--382.
    [15]
    Fuller, M., Kaszkiel, M., Kim, D., Ng, C., Robertson, J., Wilkinson, R., Wu, M., and Zobel, J 1998. TREC 7 Ad Hoc, speech, and interactive tracks. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 465--474.
    [16]
    Fuller, M., Kaszkiel, M., Kimberley, S., Zobel, C., Ng, J., Wilkinson, R., and Wu, M. 1999. The RMIT/CSIRO Ad Hoc, Q&A, Web, interactive and speech experiments at TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 549--564.
    [17]
    Gan, G., Ma, C., and Wu, J. 2007. Data Clustering: Theory, Algorithms, and Applications. SIAM.
    [18]
    Gey, F., Jiang, H., Chen, A., and Larson, R. R. 1998. Manual queries and machine translation in cross-language retrieval and interactive retrieval with Cheshire II at TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 527--540.
    [19]
    Granka, L., Joachims, T., and Gay, G. 2004. Eye-tracking analysis of user behavior in WWW search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 478--479.
    [20]
    Haveliwala, T. H. 2003. Topic-sensitive PageRank: A context-sensitive ranking algorithm for Web search. IEEE Trans. Know. Data Engin. 15, 4, 784--796.
    [21]
    He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the 11th Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science, Springer-Verlag, 43--54.
    [22]
    Hersh, W., Buckley, C., Leone, T. J., and Hickam, D. 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94). Springer-Verlag 192--201.
    [23]
    Hersh, W., Price, S., Kraemer, D., Chan, B., Sacherek, L., and Olson, D. 1998. A large-scale comparison of Boolean vs. natural-language searching for the TREC-7 interactive track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 491--500.
    [24]
    Hersh, W. and Over, P. 1999. TREC-8 interactive track report. In Proceedings of the 8th Text Retrieval Conference. 57--64.
    [25]
    Hersh, W., Turpin, A., Price, S., Kraemer, D., Chan, B., Sacherek, L., and Olson, D. 1999. Do batch and user evaluations give the same results? An analysis from the TREC-8 interactive track. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 17--24.
    [26]
    Hersh, W., Cohen, A. M., Roberts, P., and Rekapalli, H. K. 2006. TREC 2006 genomics track overview. In The Proceedings of the 8th Text Retrieval Conference (TREC-15). E. M. Voorhees and L. P. Buckland, Eds. 52--78.
    [27]
    Ho, J. and Tang, R. 2001. Towards an optimal resolution to information overload: An infomediary approach. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work. 91--96.
    [28]
    Lagergren, E. and Over, P. 1998. Comparing interactive information retrieval systems across sites: The TREC-6 interactive track matrix experiment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). ACM Press, New York, NY, 164--172.
    [29]
    Larson, R.R. 1999. Berkeley's TREC-8 interactive track entry: Cheshire II and Zprise. In Proceedings of the 8th Text Retrieval Conference (TREC-8). E. M. Voorhees and D. K. Harman, Eds. 613--622.
    [30]
    Lau, R. Y. K., Song, D., Li, Y., Cheung, C. H., and Hao, J. X. 2009a. Towards a fuzzy domain ontology extraction method for adaptive e-learning. IEEE Trans. Knowl. Data Engin. 21, 6, 800--813.
    [31]
    Lau, R. Y. K., Lai, C. L., and Li, Y. 2009b. Mining fuzzy ontology for a Web-based granular information retrieval system. In Proceedings of the 4th International Conference on Rough Set and Knowledge Technology. Lecture Notes in Computer Science, vol. 5589, Springer-Verlag, 239--246.
    [32]
    Lau, R. Y. K., Bruza, P. D., and Song, D. 2008. Towards a belief revision based adaptive and context sensitive information retrieval system. ACM Trans. Inform. Syst. 26, 2, 8.31--8.38.
    [33]
    Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet, An Electronic Lexical Database, Fellbaum, C., Ed. MIT Press, Cambridge, MA, 265--283.
    [34]
    Li, Y., Bandar, Z. A., and Mclean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Engin. 15, 4, 871--882.
    [35]
    Liu, Y., Zhang, B., Chen, Z., Lyu, M. R., and MA, W. Y. 2004. Affinity rank: A new scheme for efficient Web search. In Proceedings of the 13th World Wide Web Conference. ACM, New York, 338--339.
    [36]
    Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the 11th International Conference on Information and Knowledge Management. 375--382.
    [37]
    Liu, Y., Yang, Y., and Carbonell, J. 2002. Boosting to correct inductive bias in text classification. In Proceedings of the 11th International Conference on Information and Knowledge Management. 348--355.
    [38]
    Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 21--48.
    [39]
    Mowshowitz, A. and Kawaguchi, A. 2002. Bias on the Web. Comm. ACM, 45, 9, 56--60.
    [40]
    Ogden, W., Davis, M., and Rice, S. 1998. Document thumbnail visualizations for rapid relevance judgements: When do they pay off? In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 528--534.
    [41]
    Over, P. 1997. TREC-6 interactive track report. InProceedings of the 6th Text Retrieval Conference. 73--82.
    [42]
    Over, P. 1998. TREC-7 interactive track report. InProceedings of the 7th Text Retrieval Conference. 65--72.
    [43]
    Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/.
    [44]
    Peng, F., Schuurmans, D., and Wang, S. 2004. Augmenting naive bayes classifiers with statistical language models. Inform. Retrie. 7, 3--4, 317--345.
    [45]
    Plachouras, V., Cacheda, F., Ounis, I., and Van Rijsbergen, C. J. 2003. University of Glasgow at the Web track: Dynamic application of hyperlink analysis using the query scope. In Proceedings of the 12th Text Retrieval Conference (TREC'03). 636--642.
    [46]
    Porter, T. 1980. An algorithm for suffix striping. Program. 14, 3, 130--137.
    [47]
    Ponte, J. and Croft, B. 1998. A language modeling approach to information retrieval. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.
    [48]
    Price, S. L., Nielsen, M. L., Delcambre, L. M. L., and Vedsted, P. 2007. Semantic components enhance retrieval of domain-specific documents. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 429--438.
    [49]
    Ransdell, J. 1966. Charles Peirce: The idea of representation. Ph.D. dissertation, Columbia University, NY. Dissertations & Theses: A&I. (Publication No. AAT 6709367).
    [50]
    Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. 448--453.
    [51]
    Robertson, S. E. 1997. The probability ranking principle in IR. In Readings in Information Retrieval, Morgan Kaufmann Publishers Inc., 281--286.
    [52]
    Robertson, S. E., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the 7th Text Retrieval Conference (TREC-7). E. M. Voorhees and D. K. Harman, Eds. 253--264.
    [53]
    Roussinov, D. G. and Chen, H. 2001. Information navigation on the Web by clustering and summarizing query results. Inform. Process. Manage. 37, 6, 789--816.
    [54]
    Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM, 18, 229--237.
    [55]
    Salton, G. and Buckley, C. 1988. Term-weighted approaches to automatic text retrieval. Inform. Process. Manage. 24, 5, 513--523.
    [56]
    Salton, G. 1990. Full text information processing using the smart system. IEEE CS Tech. Comm. Datab. Engin. Bull. 13, 1, 2--9.
    [57]
    Salton, G. 1991. Developments in automatic text retrieval. Science. 253, 2053, 974--980.
    [58]
    Santaella, L. 2003. What is a symbol. Semiot. Evol., Energy, Devel. 3, 54--60.
    [59]
    Shepard, R. 1987. Towards a universal law of generation for psychological science. Science 237, 1317--1323.
    [60]
    Swan, R.C. and Allan, J. 1998. Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, NY, 173--181.
    [61]
    Van Rijsbergen, C. J. 1979. Information Retrieval. Butterworths.
    [62]
    Wang, M. and Si, L. 2008. Discriminative probabilistic models for passage based retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 419--426.
    [63]
    Yao, J. T. 2005. Information granulation and granular relationships. In Proceedings of the IEEE International Conference on Granular Computing. 326--329.
    [64]
    Yao, Y. Y. 2002. Information retrieval support systems. In Proceedings of the IEEE World Congress on Computational Intelligence. 773--778.
    [65]
    Yan, X., Song, D., and Li, S. 2006. Concept-based document readability in domain-specific information retrieval. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM). 540--549.
    [66]
    Yang, K., Maglaughlin, K. L., Meho, L., and Sumner, R.g, Jr. 1998. IRIS at TREC-7. In Proceedings of the 7th Text Retrieval Conference (TREC-7). 555--566.
    [67]
    Yang, K., Maglaughlin, K. L., and Iris, J. 1999. IRIS at TREC-8. In Proceedings of the 8th Text Retrieval Conference (TREC-8). 645--656.
    [68]
    Zadeh, L.A. 1979. Fuzzy sets and information granularity. In Advances in Fuzzy Set Theory and Applications, M. Gupta, R. K. Ragade, and R. R. Yager, Eds., North-Holland Publishing Company, 3--18.
    [69]
    Zakos, J., Verma, B., Li, X., and Kulkarni, S. 2003. Intelligent encoding of concepts in Web document retrieval. In Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'03). 72.
    [70]
    Zhai, C. 2002. Risk minimization and language modeling in text retrieval dissertation abstract. SIGIR Forum 36, 100--101.
    [71]
    Zhai, C., Cohen, W. W., and Lafferty, J. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 10--17.

    Cited By

    View all
    • (2024)A Semantic Search Engine for Helping Patients Find Doctors and Locations in a Large Healthcare OrganizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661349(2945-2949)Online publication date: 10-Jul-2024
    • (2021)Semantic Information Retrieval on Medical TextsACM Computing Surveys10.1145/346247654:7(1-38)Online publication date: 17-Sep-2021
    • (2020)Processes and methods of information fusion for ranking products based on online reviews: An overviewInformation Fusion10.1016/j.inffus.2020.02.007Online publication date: Feb-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 29, Issue 3
    July 2011
    134 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/1993036
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2011
    Accepted: 01 March 2011
    Revised: 01 September 2009
    Received: 01 October 2008
    Published in TOIS Volume 29, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Document ranking
    2. domain ontology
    3. domain-specific search
    4. granular computing
    5. information retrieval

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Semantic Search Engine for Helping Patients Find Doctors and Locations in a Large Healthcare OrganizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661349(2945-2949)Online publication date: 10-Jul-2024
    • (2021)Semantic Information Retrieval on Medical TextsACM Computing Surveys10.1145/346247654:7(1-38)Online publication date: 17-Sep-2021
    • (2020)Processes and methods of information fusion for ranking products based on online reviews: An overviewInformation Fusion10.1016/j.inffus.2020.02.007Online publication date: Feb-2020
    • (2019)Social Data: Biases, Methodological Pitfalls, and Ethical BoundariesFrontiers in Big Data10.3389/fdata.2019.000132Online publication date: 11-Jul-2019
    • (2019)Consumer Health Search on the Web: Study of Web Page Understandability and Its Integration in Ranking AlgorithmsJournal of Medical Internet Research10.2196/1098621:1(e10986)Online publication date: 30-Jan-2019
    • (2019)A niching behaviour-based algorithm for multi-level manufacturing service composition optimal-selectionJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-019-01250-0Online publication date: 15-Apr-2019
    • (2019)An empirical comparison of random forest-based and other learning-to-rank algorithmsPattern Analysis and Applications10.1007/s10044-019-00856-623:3(1133-1155)Online publication date: 28-Oct-2019
    • (2018)Assembling Frameworks for Strategic Innovation Enactment: Enhancing Transformational Agility through Situational ScanningAdministrative Sciences10.3390/admsci80300378:3(37)Online publication date: 25-Jul-2018
    • (2018)An Interaction Embedded Framework for Healthcare Search2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD))10.1109/CSCWD.2018.8465267(347-352)Online publication date: May-2018
    • (2018)A cloud-based framework for large-scale traditional Chinese medical record retrievalJournal of Biomedical Informatics10.1016/j.jbi.2017.11.01377(21-33)Online publication date: Jan-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media